Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We have to be honest - "vector" database is a low tech stuff when compared to today's AI. You shouldn't be expecting to walk into the battle of AI, which is arguable the most important one in our life time, to dig a chunk of significant profit from major AI players' pocket by just having some low tech stuff. They use external "vector databases" for now because they don't want to invest R&D resources on such non-key issues for now.

for now is the keyword here.

When the company grow to 10k or 30k people, there will be teams competing for visibility, someone is going to build their inhouse "vector database" to get his/her slice of the pie. Do you still believe that any AI major player is going to reply on some external vector databases?



Are in-house databases that common? I thought generally we’ve found as an industry that to be a great thing to purchase. I do wonder how many will need anything other than the vector support in their already existing Postgres instances though


> I do wonder how many will need anything other than the vector support in their already existing Postgres instances though

exactly! if there is a real & strong demand, we'd be seeing open source ones get upgraded & ready in months. it is more like just one of those "I want to build something easy in the core but fancy in its name to get some quick VC $"


My understanding is that these vector search databases are generally used by people who want to use an existing AI and extend it with RAG etc. Was anyone ever expecting major AI players to use a tool like this as you are suggesting?


> use an existing AI and extend it with RAG etc

companies like openai will fill the gap by offering such features out of the box. there is no logical reason why and how a big AI tech giant is going to take all hard work and letting someone else to take the profit by ignore the last mile issue. in fact, openai has already released such APIs in their last devday event.


That implies OpenAI wanting to host all your data somewhere in their vector storage.

The real issue is actually Google Cloud or AWS having a vector search solution. Or companies having a Lucene-based search engine or existing DB for vector based search.


Vector databases are complementary to “today’s AI” as they store and index embeddings which are a key output of large language models. As LLM’s get better at generating text and images, their embeddings also get better.


Can you name me a single 10k-30k people company that has their own internally built relational database? Their own internally developed document database? I've never seen this in my career.

I don't even know any 10-30k people companies that build their own search, most I've known use elastic search or lucene.

It's had to imagine that a few of these vectordb companies don't establish themselves as the standard solution, being the equivalent to MongoDB in their space. The other competent players will very likely get acquired.

Certainly these vectordb companies are in a better long term standing that the bajillion companies rushing to build products that are just calling an API endpoint at the end of the day.


please read my reply again, I was talking about AI companies (e.g. openai) are going to build their own vector database when there have a large enough team.

for your question, google built spanner (that was like 12 years ago) and leveldb when it had about 30k people. Meta/facebook forked leveldb and started building rocksdb when they had less than 30k people.

no one is saying any banking, insurance, retail companies with 30k employees should be building their own databases/search engines. They shouldn't. That is actually my main argument - those companies shouldn't get too involved for infra like that, they should just take the offerings of the big tech companies.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: