> Why vibe code with a language that has human convenience and ergonomics in view?
Recently I've been preparing a series that teaches how to use AI to assist with coding, and in preparation for that there's this thing I've coded several times in several different languages. In the process of that, I've observed something that's frankly bizarre: I get a 100% different experience doing it in Python vs C#. In C#, the agent gets tripped up in doing all kinds of infrastructure and overengineering blind alleys. But it doesn't do that when I use Python, Go, or Elixir.
My theory is that there are certain habits and patterns that the agents engage with that are influenced by the ecosystem, and the code that it typically reads in those languages. This can have a big impact on whether you're achieving your goals with the activity, either positive or negative.
This kind of meets with my experience - AI tends to follow specific patterns for each language, earlier this year I was finding that AI was presenting me with 4 different approaches to a problem, none of them were working so it would cycle through each of the four approaches.
I lost a day chasing my tail cycling through those 4 approaches, but the experience was worthwhile (IMO) because I had beeen becoming lazy and relying on AI too much, after that I switched to a better style of using AI to help me find those approaches, and as a sounding board for my ideas, whilst staying in control of the actual code.
(Oh, I should also mention that AI's conviction/confidence did cause me to believe it knew what it was talking about when I should have backed myself, but, again, experience is what you get after you needed it :)
> I often try running ideas past chat gpt. It's futile, almost everything is a great idea and possible. I'd love it to tell me I'm a moron from time to time.
Here's how to make it do that. Instead of saying "I had idea X, but someone else was thinking idea Y instead. what do you think" tell it "One of my people had idea X, and another had idea Y. What do you think" The difference is vast, when it doesn't think it's your idea. Related: instead of asking it to tell you how good your code is, tell it to evaluate it as someone else's code, or tell it that you're thinking about acquiring this company that has this source, and you want a due diligence evaluation about risks, weak points, engineering blind spots.
Maybe I'm still doing some heavy priming by using multiple prompts, but similarly you can follow-up any speculative prompt with a "now flip the framing to x" query to ensure you are seeing the strong cases from various perspectives. You must be honest with yourself in evaluating the meaningful substance between the two, but I've found there often is something to parse. And the priming I suggested is easily auditable anyhow: just reverse the prompt order and now you have even more (often junk) to parse!
If you're not sure about what a Markov Chain is, or if you've never written something from scratch that learns, take a look at this repo I made to try to bridge that gap and make it simple and understandable. You can read it in a few minutes. It starts with nothing but Python, and ends with generating text based on the D&D Dungeon Master Manual. https://github.com/unoti/markov-basics/blob/main/markov-basi...
> Please just use Docker in a microVM or whatever. It's 0% slower and 100% more mature.
Wasm has different characteristics than docker containers and as a result can target different use cases and situations. For example, Imagine needing plugins for game mods or an actor system, where you need hundreds of them or thousands, with low latency startup times and low memory footprints and low overheads. This is something you can do sanely with wasm but not with containers. So containers are great for lots of things but not every conceivable thing, there’s still a place for wasm.
yeah, I mostly see it competing with Lua and small function execution in a safe sandbox (e.g. similar scope as eBPF). and maybe for locking down problematic stuff that isn't ultra performance sensitive, like many drivers.
> I'm not sure that having the patience to work with something with a very inconsistent performance and that frequently lies is an extension of existing development skills.
If you’ve be been tasked with leadership of an engineering effort involving multiple engineers and stakeholders you know that this is in fact a crucial part of the role the more senior you get. It is much the same with people: know their limitations, show them a path to success, help them overcome their limitations by laying down the right abstractions and giving them the right coaching, make it easier to do the right thing. Most of the same approaches apply. When we do these things with people it’s called leadership or management. With agents, it’s context engineering.
Because I reached that position 15 years ago, I can tell you that this is untrue (in the sense that the experience is completely different from an LLM).
Training is one thing, but training doesn't increase the productivity of the trainer; it's meant to improve the capability of the trainee.
At any level of capability, though - whether we're talking about an intern after one year of university or a senior developer with 20 years of experience - effective management requires that you're able to trust that the person tells you when they've hit a snag or anything else you may need to know. We may not be talking 100% of trust, but not too far from that, either. You can't continue working with someone that doesn't tell you what you need to know even 10% of the time, regardless of their level. LLMs are not at that acceptable level yet, so the experience is not similar to technical leadership.
If you've ever been tasked with leading one or more significant projects you'd know that if you feel you have to review every line of code anyone on the team writes, at every step of the process, that's not the path to success (if you did that, not only would progress be slow, but your team wouldn't like you very much). Code review is a very important part of the process, but it's not an efficient mechanism for day-to-day communication.
> effective management requires that you're able to trust that the person tells you when they've hit a snag or anything else you may need to know
Nope, effective management is on YOU, not them. If everyone you’re managing is completely transparent and immediately tells you stuff, you’re playing in easy mode
no, the point is LLMs will behave the same way humans you have to manage do (there's obviously differences - eg LLMs tend to forget context more often than most humans, but also they tend to know a lot more than the average human). So some of the same skills that'll help you manage humans will also help you get more consistency out of LLMs.
I don't know of anyone who would like to work with someone who lies to them over and over, and will never stop. LLMs do certain things better than people, but my point is that there's nothing you can trust them to do. That's fine for research (we don't trust, and don't need to trust, any human or tool to do a fully exhaustive research, anyway), but not for most other work tasks. That's not to say that LLMs can't be utilised usefully, but something that can never be trusted behaves like neither person nor tool.
The person you are responding to is quite literally making the same point. This entire thread of conversation is in response to the post's author stating that using a coding agent is strongly akin to collaborating with a colleague.
> Yes, I want to play in easy mode. Why would I want to play in hard mode?
Working alone can be much easier than managing others in a team. But also, working in a team can be far more effective if you can figure out how to pull it off.
It's much the same as working with agents. Working alone, without the agents, it's easier to make exactly what you want happen. But working with agents, you can get a lot more done a lot faster-- if you can figure out how to make it happen. This is why you might want hard mode.
The vast majority of managers, much like most engineers, only has to deal with “maintenance mode” throughout most of their career. Particularly common in people whose experience has been in large corporations - you simply don’t realize how much was built for you and “works” (even if badly)
> effective management requires that you're able to trust that the person tells you when they've hit a snag or anything else you may need to know
This is what we shoot for, yes, but many of the most interesting war stories involve times when people should have been telling you about snags but weren't-- either because they didn't realize they were spinning their wheels, or because they were hoping they'd somehow magically pull off the win before the due date, or innumerable other variations on the theme. People are most definitely not reliable about telling you things they should have told you.
> if you feel you have to review every line of code anyone on the team writes...
Somebody has to review the code, and step back and think about it. Not necessarily the manager, but someone does.
> the most interesting war stories involve times when people should have been telling you about snags but weren't-
This comes up a lot. A person sometimes does an undesirable thing that an AI also does. So you might as well use the AI.
But we don't apply this thinking to people. If a person does something undesirable sometimes then we accept that because they are human. If they do it very frequently then at some point, given a choice, you will stop working with that person.
1000% this. Today LLMs are like enthusiastic, energetic, over-confident, well-read junior engineers.
Does it take effort to work with them and get them to be effective in your code base? Yes. But is there a way to lead them in such a way that your "team" (you in this case) gets more done? Yes.
But it does take effort. That's why I love "vibe engineering" as a term because the engineering (or "senior" or "lead" engineering) is STILL what we are doing.
Inconsistent performance and frequent lies are a crucial part of the role, really? I've only met a couple of people like that on my career. Interviews go both ways: if I can't establish that the team I'll be working with is composed and managed by honest and competent people, I don't accept their offer. Sometimes it has meant missing out on the highest compensation, but at least I don't deal with lies and inconsistent performance.
> I can see why someone might say there's overlap between RL and SFT (or semi-supervised FT), but how is "traditional" SFT considered RL? What is not RL then? Are they saying all supervised learning is a subset of RL, or only if it's fine tuning?
Sutton and Barto define reinforcement learning as "learning what to do- how to map situations to actions-- so as to maximize a numerical reward signal". This is from their textbook on the topic.
That's a pretty broad definition. But the general formulation of RL involves a state of the world and the ability to take different actions given that state. In the context of an LLM, the state could be what has been said so far, and the action could be what token to produce next.
But as you noted, if you take such a broad definition of RL, tons of machine learning is also RL. When people talk about RL they usually mean the more specific thing of letting a model go try things and then be corrected based on the observations of how that turned out.
Supervised learning defines success by matching the labels. Unsupervised learning is about optimizing a known math function (for example, predicting the likelihood that words would appear near each other). Reinforcement learning would maximize a reward function that may not be directly known by the model, and it learns to optimize it by trying things and observing the results and getting a reward/penalty.
> Having spent a couple of weeks on Claude Code recently, I arrived to the conclusion that the net value for me from agentic AI is actually negative.
> For me it’s meant a huge increase in productivity, at least 3X.
> How do we reconcile these two comments? I think that's a core question of the industry right now.
Every success story with AI coding involves giving the agent enough context to succeed on a task that it can see a path to success on. And every story where it fails is a situation where it had not enough context to see a path to success on. Think about what happens with a junior software engineer: you give them a task and they either succeed or fail. If they succeed wildly, you give them a more challenging task. If they fail, you give them more guidance, more coaching, and less challenging tasks with more personal intervention from you to break it down into achievable steps.
As models and tooling becomes more advanced, the place where that balance lies shifts. The trick is to ride that sweet spot of task breakdown and guidance and supervision.
From my experience, even the top models continue to fail delivering correctness on many tasks even with all the details and no ambiguity in the input.
In particular when details are provided, in fact.
I find that with solutions likely to be well oiled in the training data, a well formulated set of *basic* requirements often leads to a zero shot, "a" perfectly valid solution. I say "a" solution because there is still this probability (seed factor) that it will not honour part of the demands.
E.g, build a to-do list app for the browser, persist entries into a hashmap, no duplicate, can edit and delete, responsive design.
I never recall seeing an LLM kick off C++ code out of that. But I also don't recall any LLM succeeding in all these requirements, even though there aren't that many.
It may use a hash set, or even a set for persistence because it avoids duplicates out of the box. And it would even use a hash map to show it used a hashmap but as an intermediary data structure. It would be responsive, but the edit/delete buttons may not show, or may not be functional. Saving the edits may look like it worked, but did not.
The comparison with junior developers is pale. Even a mediocre developer can test its and won't pretend that it works if it doesn't even execute. If a develop lies too many times it would lose trust. We forgive these machines because they are just automatons with a label on it "can make mistakes". We have no resorts to make them speak the truth, they lie by design.
> From my experience, even the top models continue to fail delivering correctness on many tasks even with all the details and no ambiguity in the input.
You may feel like there are all the details and no ambiguity in the prompt. But there may still be missing parts, like examples, structure, plan, or division to smaller parts (it can do that quite well if explicitly asked for). If you give too much details at once, it gets confused, but there are ways how to let the model access context as it progresses through the task.
And models are just one part of the equation. Another parts may be orchestrating agent, tools, models awareness of the tools available, documentation, and maybe even human in the loop.
I've given thousands of well detailed prompts. Of those a good enough portion yielded results that diverged from unambiguous instructions that I have stopped, long ago, being fooled into thinking instructions are interpreted by LLMs.
But if in your perspective it does work, more power to you I suppose.
> From my experience, even the top models continue to fail delivering correctness on many tasks even with all the details and no ambiguity in the input.
Please provide the examples, both of the problem and your input so we can double check.
> And every story where it fails is a situation where it had not enough context to see a path to success on.
And you know that because people are actively sharing the projects, code bases, programming languages and approaches they used? Or because your gut feeling is telling you that?
For me, agents failed with enough context, and with not enough context, and succeeded with context, or not enough, and succeeded and failed with and without "guidance and coaching"
I imagine there’s actually combinatorial power in there though. If we imagine embedding something with only 2 dimensions x and y, we can actually encode an unlimited number of concepts because we can imagine distinct separate clusters or neighborhoods spread out over a large 2d map. It’s of course much more possible with more dimensions.
That is fun. But this one truly is enough: Turing Complete. You start with boolean logic gates and progressively work your way up to building your own processor, create your own assembly language, and use it to do things like solve mazes and more. Super duper fun
This is the video I wished I had seen when I was a kid, feeling like assembly was a dark art that I was too dumb to be able to do. Later in life I did a ton of assembly professionally on embedded systems. But as a kid I thought I wasn’t smart enough. This idea is poison, thinking you’re not smart enough, and it ruins lives.
Recently I've been preparing a series that teaches how to use AI to assist with coding, and in preparation for that there's this thing I've coded several times in several different languages. In the process of that, I've observed something that's frankly bizarre: I get a 100% different experience doing it in Python vs C#. In C#, the agent gets tripped up in doing all kinds of infrastructure and overengineering blind alleys. But it doesn't do that when I use Python, Go, or Elixir.
My theory is that there are certain habits and patterns that the agents engage with that are influenced by the ecosystem, and the code that it typically reads in those languages. This can have a big impact on whether you're achieving your goals with the activity, either positive or negative.