Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Does Symantic Search Work?
8 points by yonz on Jan 22, 2023 | hide | past | favorite | 5 comments
Looking through HN posts over the last 30 days there are at least 15 posts for symantic search. I'm excited about the prospect but haven't seen good results.

Is there any info on query performance? And to builders, did you get better results by incorporating semantic search?




I helped develop a "semantic search" engine for patent and non-patent literature about 10 years ago which was highly successful, enough that our demo made a big sale on the very first day.

This engine used a neural network to train an autoencoder that crunches down the word counts for thousands of words to a moderate dimensional vector, say n=50. This captures correlations between words such that similar documents are more consistently close in the embedding space than they are in the very high dimensional word vector space.

This kind of system does not improve short queries (<10 words) but is great for "more like this" queries centered on a document and taking paragraph you wrote describing an invention and finding prior art.

We used the TREC evaluation methodology, public data, our proprietary data, and the opinions of users to conclude our product was much better than a simple baseline search engine and our competitors.


I can imagine how painful it would be to rely on keywords to search through parents so your success makes sense to me.

But I should clarify, did we get enough of a comprehension boost from transformer based Pretrained LLMs to successfully interpret query intent and find related items?


I'd think you could evaluate a system like that with the TREC methodology.

I've seen plenty of blog posts where somebody did something poorly motivated with an embedding and had a search engine that worked but didn't do any real evaluation so you don't know if it is better or worse than a simple search engine that uses, say, Okapi BM25.


We’ve had good results with semantic search. We use it because keyword search doesn’t handle minor changes in words gracefully and semantic search does.


Would you say it works more like fuzzy text search than searching by meaning?

For example i expect that searching "a snarky blog post" or "uplifting people"... wouldn't work that great




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: