Hacker News new | past | comments | ask | show | jobs | submit login

Just want to say how great I am for calling this out a few months ago https://news.ycombinator.com/context?id=41470605



It's nice to hear that! And from this thread, it is not us only two—otherwise, the title wouldn't have resonated with the Hacker News community.

This blog post stemmed from my frustration that people use cosine distance without a second thought. In virtually all tutorials on vector databases, cosine distance is treated as if it were some obvious ground truth.

When questioned about cosine similarity, even seasoned data scientists will start talking about "the curse of dimensionality" or some geometric interpretations but forget that (more than often) they work with a hack.


Your post was much better than my stupid comment, and I like the points you articulated. Cheers.


You called it! But it is a pattern as old as the hills in the software industry. "Just add an index". "Put it in the cloud" "Do sprints". One size fits all!


That was a helpful list, in your second comment downthread. What are your top 3 metrics that perform the best on the greatest number of those features that make cosine distance perform poorly?


Good question. Unfortunately, I'm just a keyboard warrior asshole that bad mouths things without offering solutions




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: