Hacker Newsnew | past | comments | ask | show | jobs | submit | Tananon's commentslogin

Nice, I actually read that Jina article when it was published, but forgot they use facility location as well! The saturated coverage algorithm looks pretty interesting, I'll have a look at how feasible it would be to add that to Pyversity.


That's indeed something I plan to add in the near future. I'll probably add a tutorial as well to showcase how you can use this with e.g. sentence transformers. There's some pretty good benchmarks in the paper that I used as inspiration for some of these algorithms: https://arxiv.org/pdf/1709.05135, I'll most likely try to reproduce some of these.


True, I think that's also a great usecase! Though these algorithms likely won't scale to very large datasets (e.g. millions of samples), but for smaller datasets, like fine-tuning sets, I think this would work very well. I've worked on something similar in the past that works for larger datasets (semantic deduplication: https://github.com/MinishLab/semhash).


I think you are referring to for "batch in sentences.chunks(batch_size)"? This is not actually chunking sentences, chunks() is simply an iterator over a slice (in this case, a slice of all our input sentences of length batch_size). We don't have an actual constraint on input length. We truncate to 512 tokens by default, but you can easily set that to any amount by directly calling encode_with_args. There's an example in our quickstart: https://github.com/MinishLab/model2vec-rs/tree/main?tab=read....


Awesome to hear! It's great to see the Rust ML ecosystem growing, and we hope we can be a small part of it. Don't hesitate to reach out with any ideas or requests!


We support loading from both local as well as Hugging Face paths with from_pretrained! So let model = StaticModel::from_pretrained("my_custom_model", None, None, None)?; will work.


Indeed, I also didn't expect it to be so much faster! I think it's because most of the time is actually spent on tokenization (which also happens in Rust in the Python package), but there is some transfer overhead there between Rust and Python. The other operations should be the same speed I think.


It depends a bit on the task and language, but my go-to is usually minishlab/potion-base-8M for every task except retrieval (classification, clustering, etc). For retrieval minishlab/potion-retrieval-32M works best. If performance is critical minishlab/potion-base-32M is best, although it's a bit bigger (~100mb).

There's definitely a quality trade-off. We have extensive benchmarks here: https://github.com/MinishLab/model2vec/blob/main/results/REA.... potion-base-32M reaches ~92% of the performance of MiniLM while being much faster (about 70x faster on CPU). It depends a bit on your constraints: if you have limited hardware and very high throughput, these models will allow you to still make decent quality embeddings, but ofcourse an attention based model will be better, but more expensive.


Thanks man this is incredible work, really appreciate the details you went into.

I've been chewing on if there was a miracle that could make embeddings 10x faster for my search app that uses minilmv3, sounds like there is :) I never would have dreamed. I'll definitely be trying potion-base in my library for Flutter x ONNX.

EDIT: I was thanking you for thorough benchmarking, then it dawned on me you were on the team that built the model - fantastic work, I can't wait to try this. And you already have ONNX!

EDIT2: Craziest demo I've seen in a while. I'm seeing 23x faster, after 10 minutes of work.


Thanks so much for the kind words, that's awesome to hear! If you have any ideas or requests, don't hesitate to reach out!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: