How do you deal with the comments sometimes being relatively noisy for humans? I tend to be annoyed by comments overly referring to a past correction prompt and not really making sense by themselves, but then again this IS probably the highest value information because these are exactly the things the LLM will stumble on again.
> How do you deal with the comments sometimes being relatively noisy for humans?
To extents, that is a function of tweaking the prompt to get the level of detail desired and signal/vs noise produced by the LLM. e.g. constraining the word count it can use for comments.
We have a small team of approvers that are reviewing every PR and for us, not being able to see the original prompt and flow of interactions with the agent, this approach lets us kind of see that by proxy when reviewing the PR so it is immensely useful.
Even for things like enum values, for example. Why is this enum here? What is its use case? Is it needed? Having the reasoning dumped out allows us to understand what the LLM is "thinking".
(Of course, the biggest benefit is still that the LLM sees the reasoning from an earlier session again when reading the code weeks or months later).
I personally do this and I can imagine a world in which it is popular with privacy/sovereignty enthusiasts. I have doubts that this share of people will be significant enough for many companies to cater their products to this model - but if anyone will, it will be Apple - and it would yield them a few extra Mac Studio sales and likely make much more profit than selling the same service.
It’s a big deal! Prompt processing was previously the Mac’s weak point. Sure, output generation matters for file recital in programming, but in general conversation I’d rather have it output a short answer anyway (after extensive processing by a smart model).
> The damn thing _talks_. You can just _speak_ to it. You can just ask it to do what you want
I mean - yeah. So do humans. But it turns out that that a lot of humans require considerable process to productively organize too. A pet thesis of mine is that we are just (re-) discovering the usefulness of process and protocol.
Oh my god. I hate this so much. Gemini’s Voice mode is trained to do this so hard that it can’t even really be prompted away. It completely derails my thought process and made me stop using it altogether.
I agree that probably not everything should be stored - it’s too noisy. But the reason the session is so interesting is precisely the later part of the conversation - all the corrections in the details, where the actual, more precise requirements crystallize.
I call out false dilemma. OP probably defines "code" as one of the languages precise enough to be suited for steering Turing machines. Thus, "code" is not the opposite of "prompt". They are apples and oranges.
Lawyers can code in English, but it is not to layperson's advantage, is it?
And for example, if you prompt for something to frobnicate biweekly, there is no intelligence today, and there will never be, to extract from it whether you want the Turing machine to act twice a week or one per two weeks. It's a deficiency of language, not of intelligence.
Not at all, unless it contains very thorough reasoning comments (which arguably it should). The code is only an artifact, a lot of which is incidental and flexible. The prompts contain the actual constraints.
That’s what I do! I think it works well and helps future agents a lot in understanding why the codebase is the way it is. I do have to oversee the commit messages, but it does avoid a lot of noise and maybe it’s a normal part of HITL development.
reply