Pinecone integrates AI inferencing with vector database

tejaskumar_ · 2024-12-04T09:21:40 1733304100

This title was a little misleading to me IMO because (maybe my skill issue) I associated "inferencing" with "generation".

After reading the article, it seems Pinecone just now supports in-DB vectorization, a feature that is shared by:

- DataStax Astra DB: https://www.datastax.com/blog/simplifying-vector-embedding-g... (since May 2024)

- Weaviate: https://weaviate.io/blog/introducing-weaviate-embeddings (as of yesterday)

bobismyuncle · 2024-12-04T09:28:12 1733304492

Astra DB seems to just be a tutorial showing how to generate embeddings using another service.

Weaviate seems to have added a similar capability — kind of wild that they announced on the same day.

Looks like Pinecone also includes reranking as part of the same process — did Weaviate add that as well?

tejaskumar_ · 2024-12-04T11:19:42 1733311182

No doubt, it's technically great that Pinecone trained their own embeddings model—but from a business/customer standpoint I can't help but ask _why?_. This is one of those "build it or buy it" cases where teams must decide to either integrate with an existing solution or build their own. I'm not sure I see the advantage (from an end user perspective) of using Pinecone's home-rolled embeddings model other than, say OpenAI's, especially given the cost factor: OpenAI embeddings costs really not much.

> Astra DB seems to just be a tutorial showing how to generate embeddings using another service.

The link I shared showed how a single request to Astra DB's data API has Astra DB automatically create embeddings behind the scenes, integrating with an embedding service the user chooses when they set their database up. Indeed embeddings are generated by another service and not in-house, but from an end-user perspective, they don't need to generate embeddings themeselves as was the prior art and coordinate requests between:

- get text - generate embeddings - take embeddings and send to DB

As of May when they announced Vectorize, one request did all that. I believe from an end-user experience, this is really analogous to what Weaviate and Pinecone are offering unless I'm missing something.

mritchie712 · 2024-12-04T13:00:51 1733317251

The only reason I can see for this is to create lock-in. I'd be pretty surprised if anymore than 5% of their customers would want a model by pinecone.

jeadie · 2024-12-04T09:56:11 1733306171

This is a common feature now. If anything, for being so early to vector databases, Pinecone was rather late to integrating embeddings.

Timescale most recently added it but, yes a bunch of others: Weaviate, Spice AI, Marqo, etc.

jimminyx · 2024-12-04T11:12:52 1733310772

Do any of the others also handle reranking?

iosjunkie · 2024-12-04T13:25:54 1733318754

Qdrant does with its ‘Query API’.

https://qdrant.tech/documentation/concepts/hybrid-queries/

And handles embedding creation with its fastembed package.

https://github.com/qdrant/fastembed

cess11 · 2024-12-04T12:05:47 1733313947

I don't know about them, but Manticore does.

https://manticoresearch.com/use-case/vector-search/

kingkongjaffa · 2024-12-04T11:11:50 1733310710

Can someone please explain how this works?

I assumed that a specific flavour of LLM was needed, an “embedding model” to generate the vectors. Is this announcement that pinecone is adding their own?

Is it better or worse than the models here: https://ollama.com/search?c=embedding For example?

llm_nerd · 2024-12-04T13:27:18 1733318838

Normally you take your content and run it through an embedding model, inserting the resulting vectors into the vector DB. On a query, for instance, you run the query through the embedding model and query the vector database for the most similar hits to the resulting embedding vector. Similarly reranking is when get get the broad hits from the embedding similarity search and/or BM25, and then a reranker uses the looked up source material to rank the results more finely.

This is building it into the vector DB such that you send it the content and it is "built in".

Seems silly. It's like bundling a stove with cookware. But cookware fit specific niches and have different life cycles. I get that it might cater to some "drop in solution" targets, but seems of no value for most engineered, long-term solutions.

tejaskumar_ · 2024-12-04T11:22:26 1733311346

There's more technical detail here: https://www.pinecone.io/blog/integrated-inference/

> Is this announcement that pinecone is adding their own?

TLDR: they trained their own embeddings model and rely on Cohere for ranking. Pinecone (the database) uses this model automatically to generate and store embeddings.

> I assumed that a specific flavour of LLM was needed, an “embedding model” to generate the vectors.

You're mostly right, with one caveat: embeddings models aren't really LLMs in that they're not very large: they just map semantic meaning to numerical space.

> Is it better or worse than the models here: https://ollama.com/search?c=embedding For example?

This is the golden question. As far as I know, there is no appropriate benchmarking/eval data about this. I think the real value is the first-class integration between their model and their service.

singularity2001 · 2024-12-04T12:16:21 1733314581

I think the general rule is "the smarter the model the better the embedding" but I can't cite a paper right now. So in theory GPT 4 would give better embeddings (if extracted from the middle layers) but that would be overkill.

bobismyuncle · 2024-12-04T09:19:08 1733303948

This post has some more technical info: https://www.pinecone.io/blog/integrated-inference/

Makes a lot of sense to me to combine embedding, retrieval and reranking — I can imagine this being a way that they can differentiate themselves from the popular databases that have added support for vector search

tech2trees · 2024-12-04T10:03:07 1733306587

Nothing new, Marqo has been doing this for a while now with their all in one platform to train, embed, retrieve, and evaluate.

I've played around with Weaviate & Astra DB but Marqo is the best and easiest solution imo.

dmezzetti · 2024-12-04T11:27:45 1733311665

txtai (https://github.com/neuml/txtai) has had inline vectorization since 2020. It supports Transformers, llama.cpp and LLM API services. It also has inline integration with LLM models and a built-in RAG pipeline.