Show HN: Boomerang, a new embedding model for RAG and semantic search

llm-apprentice · on Sept 26, 2023

Great to see another company join the LLM-builder tier. Good luck Vectara!

cjcenizal · on Sept 26, 2023

I work at Vectara and I'm curious -- are folks here using Retrieval-Augmented Generation (RAG)? What's your stack and what kind of improvements have you seen in answer quality?

Nischalj10 · on Sept 26, 2023

pinecone + custom built ingestion & retrieval for codebase RAG (for the purpose of code search and understanding)

noobcoder · on Sept 29, 2023

Clickhouse + Custom Reranker

cyndaqu1l · on Sept 26, 2023

As one of the people behind the development of Boomerang, I would love to add that we've tried to be as objective with the evaluation of our model as possible. And have reported results on datasets where we do better as well as worse than other commercial offerings as well as models available on HuggingFace.

awadallah · on Sept 26, 2023

What are the limitations and challenges of Boomerang in terms of scalability to a large corpus with tens of millions off questions? (I know answer as I am one of the founders of Vectara, asking this for the benefit of others)

K0IN · on Oct 1, 2023

> Note that while Boomerang is optimized for low-latency performance, models like GTR-XXL, which weighs in at 4.8 billion parameters, are very challenging to productionize.

So what is the size of your model than, or did i miss something?

awadallah · on Sept 26, 2023

How does Boomerang handle the trade-off between speed and accuracy? Does it sacrifice the quality of the results for faster response time? (I know answer as I am one of the founders of Vectara, asking this for the benefit of others)

svcrunch · on Sept 26, 2023

The metrics presented in the blog post are those of our production model. When designing Boomerang, we tried to balance latency and search relevance in a manner that strikes the right balance for most use cases.

On the other hand, GTR-XXL is an example of a research model that biases in favor of search relevance, at the expense of latency. It's not really practical to deploy in production environments as a result.

Nischalj10 · on Sept 26, 2023

how good is it for code RAG applications?

eskibars · on Sept 26, 2023

So far, we haven't really focused on code ingestion. We've had a few users try it out for that use case, but we code ingestion and generation is a bit different. We've found a lot of users have success in the natural language areas (ingesting enterprise content, ecommerce content, etc) and then building chatbots on top of the all-in-one API

Nischalj10 · on Sept 27, 2023

true, code is very different than natural language. any plans for incorporating it?

ofermend · on Sept 26, 2023

Using Boomerang can significantly improve your end-to-end RAG performance: retrieving the most relevant facts (or chunks) matters, a lot!