Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks for this. We've been doing something similar with the Universal Sentence Encoder en masse (https://www.tensorflow.org/hub/tutorials/semantic_similarity...)

Curious if anyone has recommendations on good stashes or datasets of already-encoded embeddings? This sounds geeky but to some extent, I dont even "care" about the original text but would love to just get the embedding vectors and play with those.



BEIR is what you’re looking for :). There should be stashes of vectors for the datasets floating around.

https://paperswithcode.com/paper/beir-a-heterogenous-benchma...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: