The title says you replaced RAG, but ChromaFs is still querying Chroma on every command — you replaced RAG's interface, not RAG itself. Which is actually the more interesting finding: the retrieval was never the bottleneck, the abstraction was.
Agents don't need better search. They need `grep`.
Congrats, this looks neat, and surely great to have more TS products in the ecosystem.
One plugin or feature that I will like to see in an AI gateway:
*Cache* per unique request.
So if I send the same request (system, messages, temperature, etc.), I will have the option to pull if from a cache (if it was already populated) and skip the LLM generation. This is much faster and cheaper - especially during development and testing.
Thank you! We have built out the cache system -- we do both simple caching (matching the request strings 100%) and also do semantic caching (returning a cache hit for semantically similar requests). More here - https://portkey.ai/docs/product/ai-gateway-streamline-llm-in...
The caching part isn't open source yet, but part of our internal workers. Would be very cool to open source it!
I pushed ChatGPT to it’s coding limits in a javascript (Node.js) interview flow, and learned how it reasons about the various challenges it’s presented with.
I was blown away by what this magnificent AI can do - it seems ChatGPT can make the cut for a junior Node.js developer role :)
I get your point. There is certainly a spectrum here.
I can tell you (as the author) that for mission-critical assets with less 3rd party dependencies - I do see that most prefer to use the hard-coded or policy, or pull it via api per build in the CI/CD.
However for more dynamic websites (like blogs) that tend to have many 3rd party dependencies - It's very useful and effective to be able to update the policy with one click (or even automatically).
reply