Agree. To reduce costs: 1. Precompute frequently used knowledge and surface earl...

faangguyindia · 2025-08-24T05:33:23 1756013603

that info can be just included in preffix which is cache by LLM, reducing cost by 70-80% average. System time varies, so it's not good idea to specify it in prompt, better to make a function out of it to avoid cache invalidation.

I am still looking for a good "memory" solution, so far running without it. Haven't looked too deep into it.

Not sure how next tool call be predicted.

I am still using serial tool calls as i do not have any subagents, i just use fast inference models for directly tools calls. It works so fast, i doubt i'll benefit from parallel anything.