Thanks for trying it! So far, the the testing and validation stage has been left to the developer. While this could change in the future, my experience has been the models aren't quite good enough yet to make an auto-test/auto-fix loop like you've described the default behavior. You end up with a lot of tire-spinning where you're burning a lot of tokens to fix issues that a human can resolve trivially.
I think it's better for now to use LLMs to generate the bulk of a task, then have the developer clean up and integrate rather than trying to get the LLM to do 100%.
That said, you can accomplish a workflow like this with Plandex already by piping output into context. It would look something like:
plandex new
plandex load relevant_context.ts some_more_context.ts
plandex tell 'some kind of complex task'
# ...Plandex does its thing, but doesn't get it 100% right
npm test | plandex load
plandex tell 'please fix the problems causing the failed tests'
As the models improve, I'm definitely interested in baking this in to make it more automated.
There's a pretty huge range. It's a function of how much you load into context and how long the task is. So if you dump an entire directory with 100k tokens into context and then proceed to do a task that requires 20 steps, that will cost you a lot. Maybe >$10. But a small task where you're just making a few changes and only have a few small files in context (say 5k tokens) won't cost much at all, maybe like $0.10.
I haven't done too much digging into exactly how much people who are using Plandex Cloud are spending, but I'd say the range is quite wide even among people who are using the tool frequently. Some are doing small tasks here and there and not spending much--maybe they're on track for $5-10 per month, while I'd guess some other heavy users are on track to spend hundreds per month.
I think it's better for now to use LLMs to generate the bulk of a task, then have the developer clean up and integrate rather than trying to get the LLM to do 100%.
That said, you can accomplish a workflow like this with Plandex already by piping output into context. It would look something like:
As the models improve, I'm definitely interested in baking this in to make it more automated.