More

zmccormick7 · 2025-09-12T14:10:14 1757686214

Good to know. I've heard great things about Context7, but haven't experimented with it yet.

zmccormick7 · 2025-09-12T14:09:21 1757686161

As in the download itself didn't happen when you clicked the download button, or the installation failed?

zmccormick7 · 2025-09-12T14:08:02 1757686082

Cool, I hadn't heard of Traycer. That does look quite similar!

Completely agree. I basically built Runner to codify the process I was already using manually with Claude Code and Gemini. A lot of developers seem to be settling on a similar workflow, so I'm betting that something like Runner or Traycer will be useful for a lot of devs.

I'll be curious to see how far the existing players like Cursor push in this direction.

zmccormick7 · 2025-09-11T22:26:31 1757629591

I agree that's a major problem. It's not something I've solved yet. I suspect a web research sub-agent is likely what's needed, so it can pull in up-to-date docs for whatever library you need to work with.

zmccormick7 · 2025-09-11T22:22:53 1757629373

Gemini is required for the context management sub-agent. You can use any of OpenAI, Anthropic, or Gemini for the main planning and coding agents, but GPT-5 performs the best in my experience. Claude 4 Sonnet works well too, but it's about twice as expensive.

zmccormick7 · on Sept 23, 2024

The main thing we need to add is metadata filtering, as that's required for a lot of use cases. We're also thinking about adding hybrid search support and multi-factor ranking.

zmccormick7 · on Sept 20, 2024

We've only done full benchmarking with the FIQA dataset, comparing minDB with Chroma. We're going to try it with Qdrant and Weaviate soon too, since they both have support for quantization, which will be a more apples-to-apples comparison with our approach.

We did test uploading and querying a Wikipedia dump, which was ~35M vectors. Query latency was around 150ms and peak memory usage was 1.5GB. We couldn't test recall, though, because we didn't have queries with ground truths.

zmccormick7 · on July 25, 2024

Agreed that thresholds don't work when applied to the cosine similarity of embeddings. But I have found that the similarity score returned by high-quality rerankers, especially Cohere, are consistent and meaningful enough that using a threshold works well there.

kgeist · on July 29, 2024

I use similarity threshold (to remove absolutely irrelevant results) and then use a reranker to get Top N.

zmccormick7 · on July 24, 2024

Agreed. Retrieval performance is very dependent on the quality of the search queries. Letting the LLM generate the search queries is much more reliable than just embedding the user input. Also, no retrieval system is going to return everything needed on the first try, so using a multi-step agent approach to retrieving information is the only way I've found to get extremely high accuracy.

nostrebored · on July 25, 2024

The queries you see and the resulting user interaction should be trained into the embedding model.

This is a foundational problem that requires your data. The way you search Etsy is different than the way you search Amazon. The queries these systems see are different and so are the desired results.

Trying to solve the problem with pretrained models is not currently realistic.

zmccormick7 · on July 3, 2024

I had the same issue when searching for specific companies/products. It feels like a pretty basic vector search with no hybrid search component or reranking.