This tutorial is very complex. Here's how to get free semantic search with much less complexity:
1. Install sentence-transformers [1]
2. Initialize the MiniLM model - `model = SentenceTransformer('all-MiniLM-L6-v2')`
3. Embed your corpus [2]
4. Embed your queries, then search the corpus
This runs on CPU (~750 sentences per second), and GPU (18k sentences per second). You can use paragraphs instead of sentences if you need more text. The embeddings are accurate [3] and only 384 dimensions, so they're space-efficient [4].
Here's how to handle persistence. I recommend starting with the simplest strategy, and only getting more complex if you need higher performance:
- Just save the embedding tensors to disk, and load them if you need them later.
- Use Faiss to store the embeddings (it will use an index to retrieve them faster) [5]
- Use pgvector, an extension for postgres that stores embeddings
- If you really need it, use something like qdrant/weaviate/pinecone, etc.
This setup is much simpler and cheaper than using a ton of cloud services to do embeddings. I don't know why people make semantic search so complex.
TBH, it does not look "less complex", not at all. :) install, install, ... but where to install and run all of this? The topic is "serverless". This means you do not need to run anything, just need two cloud APIs and a Lambda Script.
How would you host sentence-transformers model for free? You need it to vectorize each query so that has to be hosted somewhere. Is there any way to do it for free?
Just run it on CPU, on your own machine. That's the cheapest way. You could also rent a free/cheap VPS, and even parallelize across multiple machines/cores if you need it.
Maybe I'm grumpy today but I am shocked at how many responses you are getting where people think this is a novel idea. Has the engineering mindset really shifted into a default of "buy" even when build could take less than a week?
I was surprised, too, but then I realized they all work at Qdrant.
But the general dialogue around AI-related tools is surprising to me. The production parts of the langchain, embeddings, etc tools can usually be built in a few hours with better observability, performance, and maintainability.
Here's how to handle persistence. I recommend starting with the simplest strategy, and only getting more complex if you need higher performance:
This setup is much simpler and cheaper than using a ton of cloud services to do embeddings. I don't know why people make semantic search so complex.I've used it for https://www.endless.academy, and https://www.dataquest.io and it's worked well in production.
[1] https://www.sbert.net/
[2] https://www.sbert.net/examples/applications/semantic-search/...
[3] https://huggingface.co/blog/mteb
[4] https://medium.com/@nils_reimers/openai-gpt-3-text-embedding...
[5] https://github.com/facebookresearch/faiss