They perform different roles, so they're not directly comparable.
Jina V3 is an embedding model, so it's a base model, further fine-tuned specifically for embedding-ish tasks (retrieval, similarity...). This is what we call "downstream" models/applications.
ModernBERT is a base model & architecture. It's not supposed to be out of the box, but fine-tuned for other use-cases, serving as their backbone. In theory (and, given early signal, most likely in practice too), it'll make for really good downstream embeddings once people build on top of it!
Isn't that what happened to one of these guys? It is reported that one of them caught the fungus due to inspecting his attic (hadn't used the guano yet).
Reportedly, one of them had a bat infestation that produced a thick layer of guano in his attic, which he planned to use as fertilizer. He probably should have inspected his attic wearing a N95 respirator.
> I mind that resources about how to use crypto in software applications are often inscrutable, all the way down to library design, for no good reason.
I haven't read it, but I plan to, eventually. There's a book titled "Cryptography Engineering: Design Principles and Practical Applications" that could help you.
> Seems like we still have a long way to go after Adam...
A preprint in arxiv suggests that Adam works better than SGD for training LLMs due to the issue of class-imbalance [0]. It appears that scaling the gradient step helps with the training, for example, see another approach suggested in [1].
0. https://arxiv.org/abs/2409.10173
reply