Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Boomerang, a new embedding model for RAG and semantic search (vectara.com)
18 points by TastyLamps on Sept 26, 2023 | hide | past | favorite | 13 comments



Great to see another company join the LLM-builder tier. Good luck Vectara!


I work at Vectara and I'm curious -- are folks here using Retrieval-Augmented Generation (RAG)? What's your stack and what kind of improvements have you seen in answer quality?


pinecone + custom built ingestion & retrieval for codebase RAG (for the purpose of code search and understanding)


Clickhouse + Custom Reranker


As one of the people behind the development of Boomerang, I would love to add that we've tried to be as objective with the evaluation of our model as possible. And have reported results on datasets where we do better as well as worse than other commercial offerings as well as models available on HuggingFace.


What are the limitations and challenges of Boomerang in terms of scalability to a large corpus with tens of millions off questions? (I know answer as I am one of the founders of Vectara, asking this for the benefit of others)


> Note that while Boomerang is optimized for low-latency performance, models like GTR-XXL, which weighs in at 4.8 billion parameters, are very challenging to productionize.

So what is the size of your model than, or did i miss something?


How does Boomerang handle the trade-off between speed and accuracy? Does it sacrifice the quality of the results for faster response time? (I know answer as I am one of the founders of Vectara, asking this for the benefit of others)


The metrics presented in the blog post are those of our production model. When designing Boomerang, we tried to balance latency and search relevance in a manner that strikes the right balance for most use cases.

On the other hand, GTR-XXL is an example of a research model that biases in favor of search relevance, at the expense of latency. It's not really practical to deploy in production environments as a result.


how good is it for code RAG applications?


So far, we haven't really focused on code ingestion. We've had a few users try it out for that use case, but we code ingestion and generation is a bit different. We've found a lot of users have success in the natural language areas (ingesting enterprise content, ecommerce content, etc) and then building chatbots on top of the all-in-one API


true, code is very different than natural language. any plans for incorporating it?


Using Boomerang can significantly improve your end-to-end RAG performance: retrieving the most relevant facts (or chunks) matters, a lot!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: