I just made a couple of searches with teclis. I have to say, it's not bad. It's clearly not complete and I get several empty searches. But the content of the results are of higher quality than what I get with Google or DDG. Nice work!
Thanks. The index is tiny and it is just a proof of concept of what a single person can do with technologies available nowadays. I felt it is better for it to return zero results than bad results.
As the site says this demo is by no means meant as a replacement for Google, but rather to complement it. I would say Teclis is good for content discovery and learning new things outside the typical search engine filter bubble. A few examples of good queries are listed on the site.
Not the author, but at work we've had in the hundreds of millions. Faiss can certainly scale.
If you do have a tiny index and want to try Google's version of vector search (as an alternative to Faiss), you can easily run ScaNN locally [1] (linked in the article, that's the underlying tech). On small scale I had better perf with ScaNN
This demo is only about million vectors. The largest I had in Faiss was embeddings of the entire Wikipedia (scale in the neighborhood of ~30 million vectors). I know people running few billion vectors in Faiss.
So one vector per article? Doesn’t this skew results? A short article with 0.9 relevance score would rank higher than a long article containing one paragraph with 1.0 relevance. Am I mistaken?
Also, BERT on cheap hardware? I thought that without a GPU, vectorizing millions of snippets or doing sub-second queries was basically out of the question.
CPU BERT inference is fast enough to embed 50 examples per second. Your large index is built offline, the query is embedded live but it's just one at a time. Approximate similarity search complexity is logarithmic so it's super fast on large collections.
It's about choosing the right Transformer model, there are several models which are smaller, with fewer parameters than bert-base which gives the exact same accuracy as bert-base, which you can use on a modern CPU single digit ms, even with a single intra-thread. See for example, https://github.com/vespa-engine/sample-apps/blob/master/msma...
I compared BERT[1], distilbert[2], mpnet[3] and minilm[4] in the past. But the results I got "out of the box" for semantic search were not better than using fastText, which is orders of magnitude faster. BERT and distilbert are 400x slower than fastText, minilm 300x, and mpnet 700x. At least if you are using a CPU-only machine. USE, xlmroberta and elmo were even worse (5,000 - 18,000x slower).
I also love how fast and easy it is to train your own fastText model.
Vector models are nothing but representation learning and applying the model out-of-domain usually gives worse results than plain old BM25. See https://arxiv.org/abs/2104.08663
A concrete example is DPR which is a state of the art dense retriever model for wikipedia for question answering, when applying that model on MS Marco passage ranking it performs worse than plain BM25.