Your comment reminds me that when I first wrote about MCP it reminded me of COM/DCOM and how this was a bit of a nightmare, and we ended up with the infamous "DLL Hell"...
Related. Here is info on how custom tools added via MCP are defined, you can even add fake tools and trick Claude to call them, even though they don't exist.
Here is what I have been able to reverse engineer for o3...
At high level it maintains about ~40 conversations in system prompt under a section called "recent conversation content". It only contains what the user typed, not assistant responses (probably due to prompt injection) - there a few corner cases though. :)
There are other sections in the system prompt now that contain aggregated info, so recent conversations turn into user insights over time I believe.
It can't actually "search" history afaik - that part I'm still wondering, as it was my first thought on how it might work...
I also found a way to exfiltrate the recent content - so hopefully that will be fixed soon...
Overall, this feature creates a lot of confusion and response quality declines at times too - and anything someone posts now online (like weird behavior or hallucinations,...) is likely influenced by their past conversations! So it will make it more difficult to understand what's really happening.
I think it would be cool if "projects" would be entirely isolated with their own memories and history etc. or have different "profiles"
If I paste a huge article in for it to summarize presumably it's smart enough not to keep dumping that into my future context?
I'd love a version of this that was tied to projects - then I could maintain way more control over my context without worrying that weird stupid stuff was leaking into my real work.
Yeah, the number of ~40 needs a bit more validation. I did observe the list being trimmed around 40, which aligns with the title "recent conversations content".
You can try simple repros like: 'list all "recent conversation content" entries', or 'how many "recent conversation content entries" are there above'...
it has timestamp, summary and then all the messages the user typed if you ask for the details.
That's cool. I did something similar in the early days with Google Bard when data visualization was added, which I believe was when the ability to run code got introduced.
One question I always had was what the user "grte" stands for...
Btw. here the tricks I used back then to scrape the file system:
The "runtime" is a google internal distribution of libc + binutils that is used for linking binaries within the monolithic repo, "google3".
This decoupling of system libraries from the OS itself is necessary because it otherwise becomes unmanageable to ensure "google3 binaries" remain runnable on both workstations and production servers. Workstations and servers each have their own Linux distributions, and each also needs to change over time.
IIRC Google has a policy whereby all google3 binaries must be rebuilt within a 6-month window. This allows teams to age-out support for old versions of things, including glibc. grte supports having multiple multiple versions of itself installed side-by-side to allow for transition periods ("v5" in the article).
That's actually crazy and I'll keep it in mind. Right now, I am mostly using it for data generation, so no untrusted prompts are going in. I'll add a disclaimer to the repo.
A previous company tried to do this with a single “clean_xss” function. It’s not possible because different contexts of code have different sanitization logic. JSON encoding, URL encoding, DOM sources and sinks, HTML attributes, SCRIPT tag, CSS, etc all are escaped or sanitized in different ways.
Trying to make a single function/script with no knowledge of contexts just makes the developer sense more security than exists.
Here a write up for issues back with Grok 2, demoing prompt injection from uploaded docs or other user's posts, data leakage, hidden prompt injection, ASCII Smuggling, etc.
The zero-click data leakage was fixed (at least in the dedicated webapp) but xAI never acknowledged that these issues are security vulnerabilities - which was quite an interesting response.
reply