You can modify this, theres settings for
- how much context
- chunk size
We had to do this, 3 best matches but about 1000 characters each was far more effective than the default I ran into of 15-20 snippets of 4 sentences each
We also found a setting for "when do you cut off and/or start" the chunk, and set it to double new lines
Then just structured our agentic memory into meaningful chunks with 2 new lines between each, and it gelled perfectly.
I have production agents which run vector search via FAISS locally ( in their env not 3rd party environments ), and for which I am creating embeddings for specific domains.
1 - agent memory ( its an ai coach so its the unique training methods that allow for instant adoption of new skills and distilling best fit skills for context )
2 - user memory ( the ai coaches memory of a user )
3 - session memory ( for long conversations, instead of compaction or truncation )
Then separately I have coding agents which I give semantic search, same system FAISS
- on command they create new memories from lessons ( consumes tokens * )
- they vector search FAISS when needing more context ( 2x greater agent alignment / outcomes this way )
And finally I forked openais codex terminal agent code to add
- inbuilt vector search and injection
So I say "Find any uncovered TDD opportunity matching intent to actuality for auth on these 3 repos, write TDD coverage, and bring failures to my attention"
They set my message to {$query}
vector search on {$query}
embed results in their context window
programmatically - so no token consumption ( what a freaking dream )
I am not training the agents yet, like fine tuning the underlying models.
Would love the simplest approach to test this, because at least with the codex clone I could easily swap out local models, but somehow doubting that they will be able to match performance of the outsourced models.
especially bc claude code just launched ahead of codex in the last week or so in quality, and they are closed source. Im seeing clear swarm agentic coding internally which is a dream for context window efficiency. ( in claude code as of today )
Today Im adding a focus mode and the ability to share holons ( top level knowledge chunks and their children ), and Ill use those features together to launch a personal bio site that agents can help curate with semantic search access of the governing context within my knowledge base.
My thought is to then allow for quick template swapping to turn your curated works with Agents in your personal knowledge base into professional grade websites, and then following that with some agentic tools that would allow for instant creation of websites in a single prompt.
I have seen some other tools do this like Lovable, and it felt like clickbait on my first try. They embedded all these fake links like "contact us" then the agents did not understand how to find and remove them, after going back and forth with agents on literally the basics of not putting broken buttons out there, I hit a demo paywall.
I keep seeing the ads "get a site in one prompt" and I just wonder how they got so big without actually figuring out the basics on how to actually deliver that. I think an open source alternative would be pretty nice, right?
Maybe one that actually works?
What should I do next? Want to see more demos? Want me to focus on the last list of boring things before open source release? ( there are lots of little things like re ordering on the top most level of the holons not working at the moment thats gonna consume some time fixing with nominal benefit )
Whats the most outrageously good thing about this angle?
This is helpful. Love to see a demo of how tight you got the context window injection against a query. Thats where theres always 70% bloat in my previous systems.
I solved this by building holonically, same structure as you have it seems roughly, so I actually, through a ui can grab a holon and inject it into context including its children ( holon ~ nested heirarchy ), And I usually use semantic search so Ill add that in as well.
I have not added agentic memory flows yet, like when a model asks itself if it has what it needs and allows itself to look deeper.. have you?
I have agentic flows with other things, about 15 cascading steps between user and ai response, but have not done so with memory yet.
Hey, we should talk. If your project is stable might have a collaboration that would align us both.
I have a functional UI for storing the knowledge base between my AI agents and can have an MCP server functional within a couple of days.
Right now it accesses personal instances of AI engines or the cloud, but Builds my private local knowledge base in the process. I can import in 1s from other systems as well.
Instant, navigable, lighting fast, local ui for managing my ais memory. I use semantic search for the lookups as of the moment.
Sounds like perhaps the 2 tools together would add to each other.
What Im working on today pre tech+human conference
Perks
- Extract any AI conversation instantly into your own personal knowledge base
- Instant one click veto purge of ai output noise
- Holonic Nesting, Attention on only what matters
- Instant sync with GPT and Anthropic outputs
- Instant searching of your own personal knowledge base
- ZERO cloud requirements, runs on your local machine
Problem targeted
- AI Alignment at scale
- Personal Memory limitations to collab with AI outputs
- Personal AI conversations get lost instead of making it smarter
- Personal time management losses, ai output overwhelm => clarity and focus
- AI providers compromised, reclaims data sovereignty
Coming Next
- Instant conversion of your knowledge base into Training data => Train your personal AI => swap in so it gets smarter over time
- Instant vectorization of your knowledge => AI generated ui gets more aligned over time
Built to run locally so instant searchable personal knowledge base that you can one click use to export data that trains your own personal ai
locally ...
no shared db, no trust issues, no cloud
Instantly convert conversations with chat gpt or anthropic into your own personal knowledge base, curate actionable tasks from it, and act on them, while the system handles organizing and managing everything...
let me know... considering open source vs closed source for the whole system but its def looking like it solves a major global pain point / opportunity with ai.
Produced as a part of Next AI Labs Social Impact Works, creators of ixcoach
Lots missing here, but I had the same issues, it takes iteration and practice. I use claude code in terminal windows, and text expander to save explicit reminders that I have to inject super regularly because anthropic obscures access to system prompts.
For example, I have 3 to 8 paragraph long instructions I will place regularly about not assuming, checking deterministically etc. and for most things I have the agents write a report with a specific instruction set.
I pop the instructions into text expander so I just type - docs when saying go figure this out, and give me the path to the report when done.
They come back with a path, and I copy it and search vscode
It opens as an md and i use preview mode, its similar to a google doc.
And ill review it. always, things will be wrong, tons of assumptions, failures to check determistically, etc... but I see that in the doc and have it fix it. correct misunderstandings, update the doc until its perfect.
From there ill say add a plan in a table with status for each task based on this ( another text expander snippet with instructions )
And WHEN thats 100% right, Ill say implement and update as you go. The update as you go forces it to recognize and remember the scope of the task.
Greatest points of failure in the system is misalignment. Ethics teams got that right. It compounds FAST if allowed. you let them assume things, they state assumptions as facts, that becomes what other agents read and you get true chaos unchecked.
I started rebuilding claude code from scratch literally because they block us from accessing system prompts and I NEED these agents to stop lying to me about things that are not done or assumed, which highlights the true chaos possible when applied to system critical operations in governance or at scale.
I also built my own tool like codex for managing agent tasks and making this simpler, but getting them to use it without getting confused is still a gap.
Let me know if you have any other questions. I am performing the work of 20 Engineers as of today, rewrote 2 years of back end code that required a team of 2 engineers full time work in 4 weeks by myself with this system... so I am, I guess quite good at it.
I need to push my edges further into this latest tech, have not tried codex cli or the new tool yet.
Its a total of about 30 snippets, avg 6 paragraphs long, that I have to inject. for each role switch it goes through i have to re inject them.
its a pain but it works.
Even TDD it will hallucinate the mocks without management. and hallucinate the requirements. Each layer has to be checked atomically, but the text expander snippets done right can get it close to 75% right.
My main project faces 5000 users so I cant let the agents run freely, whereas with isolated projects in separate repos I can let them run more freely, then review in gitkraken before committing.
You could just use something like roo code with custom modes rather than manually injecting them. The orchestrator mode can decide on the other appropriate modes to use for subtasks.
You can customize the system prompts, baseline propmts, and models used for every single mode and have as many or as few as you want.
I have an immediate use case for this. Can you stream via AI to support real time chat this way?
Very very good!
Jonathan
founder@ixcoach.com
We deliver the most exceptional simulated life coaching, counseling and personal development experiences in the world through devotion to the belief that having all the support you need should be a right, not a privilege.
Test our capacity at ixcoach.com for free to see for yourself.
To some degree, yes. But, theres a low value to cost ratio in that exact UX.
Take a single character in the game, and enable that character the depth and nuance of a true experience between a Zen Master / Inquiry facilitator, powered by AI. IXCoach.com can do a phenomenal job powering this, so literally the only code needed for an MPV is the mod + character api.
Then, the cost benefit ratio is 400x, and in a day of coding you have taken a game that is mostly pure entertainment, and provided a means for depth, nuance and personal development that literally leads the market.
I pinged the executive producer of CD Project Red on this, it's viable.
I agree in the context of LLMs running locally. For API connected games, cloud support for nuanced conversations would be a tremendous value add. Take a hit like Cyberpunk, create a Mod that wires into a custom AI from ixcoach.com... we could literally integrate the most nuanced self inquiry practices into the top games this way.
Anyone working on top games through mods that wants to explore this, let me know, Next AI Labs would be interested in supporting such efforts.
We had to do this, 3 best matches but about 1000 characters each was far more effective than the default I ran into of 15-20 snippets of 4 sentences each
We also found a setting for "when do you cut off and/or start" the chunk, and set it to double new lines
Then just structured our agentic memory into meaningful chunks with 2 new lines between each, and it gelled perfectly.
( hope this helps )