Mini-lm is a better embedding model. This model does not perform attention calcu...

ryeguy_24 · 2024-09-15T14:32:49 1726410769

Which do use attention? Any recommendations?

nostrebored · 2024-09-15T17:04:31 1726419871

Depends immensely on use case — what are your compute limitations? are you fine with remote code? are you doing symmetric or asymmetric retrieval? do you need support in one language or many languages? do you need to work on just text or (audio, video, image)? are you working in a specific domain?

A lot of people wind up using models based purely on one or two benchmarks and wind up viewing embedding based projects as a failure.

If you do answer some of those I’d be happy to give my anecdotal feedback :)

ryeguy_24 · 2024-09-15T21:06:02 1726434362

Sorry, I wasn’t clear. I was speaking about utility models/libraries to compute things like meaning similarity with not just token embeddings but with attention too. I’m really interested in finding a good utility that leverages the transformer to compute “meaning similarity” between two texts.

deepsquirrelnet · 2024-09-15T16:11:21 1726416681

Most current models are transformer encoders that use attention. I like most of the options that ollama provides.

I think this one is currently the top of the MTEB leaderboard, but large dimension vectors and a multi billion parameter model: https://huggingface.co/nvidia/NV-Embed-v1