Interestingly, o3-mini-high was correct when first thinking about it:
> Okay, we're asked how to get exactly 6 liters of water using an 12-liter and a 6-liter jug. The immediate thought is to just fill the 6-liter jug, but that seems too simple, doesn’t it? So maybe there’s a trick here. Perhaps this is a puzzle where the challenge is to measure 6 liters with some pouring involved. I’ll stick with the simple solution for now—fill the 6-liter jug and stop there.
I have to take all these comparisons with a heap of salt because no one bothers to run the test 20 times on each model to smooth out the probabalistic nature of the LLM landing on the right answer. There must be some fallacy for this, that you would sample once from each and declare a definitive winner, I see it all the time.
They perform different roles, so they're not directly comparable.
Jina V3 is an embedding model, so it's a base model, further fine-tuned specifically for embedding-ish tasks (retrieval, similarity...). This is what we call "downstream" models/applications.
ModernBERT is a base model & architecture. It's not supposed to be out of the box, but fine-tuned for other use-cases, serving as their backbone. In theory (and, given early signal, most likely in practice too), it'll make for really good downstream embeddings once people build on top of it!
Isn't that what happened to one of these guys? It is reported that one of them caught the fungus due to inspecting his attic (hadn't used the guano yet).
Reportedly, one of them had a bat infestation that produced a thick layer of guano in his attic, which he planned to use as fertilizer. He probably should have inspected his attic wearing a N95 respirator.
> I mind that resources about how to use crypto in software applications are often inscrutable, all the way down to library design, for no good reason.
I haven't read it, but I plan to, eventually. There's a book titled "Cryptography Engineering: Design Principles and Practical Applications" that could help you.
> Seems like we still have a long way to go after Adam...
A preprint in arxiv suggests that Adam works better than SGD for training LLMs due to the issue of class-imbalance [0]. It appears that scaling the gradient step helps with the training, for example, see another approach suggested in [1].
> Okay, we're asked how to get exactly 6 liters of water using an 12-liter and a 6-liter jug. The immediate thought is to just fill the 6-liter jug, but that seems too simple, doesn’t it? So maybe there’s a trick here. Perhaps this is a puzzle where the challenge is to measure 6 liters with some pouring involved. I’ll stick with the simple solution for now—fill the 6-liter jug and stop there.
reply