Hacker Newsnew | past | comments | ask | show | jobs | submit | amunozo's commentslogin

Did you compare it with Kimi K2.6 and DeepSeek V4 Pro? I feel they're similar but as GLM is more expensive, I am not using it much.

Drinking coffee for caffeine is pathetic, in my humble opinion.

I am using both on OpenCode Go plan and they're pretty good, but I would say still not at the same level at GPT-5.5 in my experience, I don't know about Opus.

On a different note, is Ollama cloud good?


> is Ollama cloud good?

I'd say they have reliability issues but for the price it's worth it.

I like that usage isn't measured per token but per computation time, which means that you get more usage when models become more efficient.


Location: Lausanne, Switzerland (EU citizen)

Remote: Yes Willing to relocate: Within Switzerland

Technologies: Python, PyTorch, Hugging Face, Docker, Kubernetes, Slurm, Run:ai, LoRA/QLoRA, RAG, LLM evaluation

Website: https://amunozo.com

Résumé/CV: Available upon request

Email: Available upon request.

GitHub: https://github.com/amunozo

I'll finish a myPhD in NLP this June. During my PhD, I built NLP systems that work under limited data and developed methods to evaluate what language models learn and how they behave.

Now I want a research engineer or applied scientist role working on LLM evaluation and efficient training. Available from July 1st.


It's Google time to release something. If I'm not mistaken, it's the big lab that did not release a big model in the last month.

Google released Gemma4 recently and got quite good reviews from the local models community.

That's why I said "big models" (i.e., Gemini Pro). But yes, I've had forgotten about Gemma.

They have always released slowly, and they are usually tagged "preview".

I'm testing it right now and it seems very buggy and unstable, just like before.

This is the bar for anybody that's not the frontier labs.

Price and speed.

I want to believe it's gonna be good, but after trying GPT-5.5 even the most advanced Chinese models seem depressing.

I am not following this obsession with SOTA and benchmark rankings

I have been using DeepSeek and GLMnmodels with OpenCode and Codex and Claudr side by side.

I have not found the Chinese models lacking. I enjoy for coding and like to maintain full control of my codebade and deeply care about the GOF patterns. So I am very stringent in terms of what I want the LLM to code and how to code.

So from my perspective, they are all about the same.


That I agree with, but for more complex autonomous changes the differences are considerable. However, it seems that most models will reach the saturation time in which they will be useful for almost everything and the difference will be in more and more niche and specialized tasks.

This is a French model sir

Évidemment

Funny detail: Google AI (the one they use in search) can't spell évidemment correctly.


What's French for 'goblin'...?

Then you’ll be happy to learn it’s not Chinese

GP is stating that the second best in the field, the Chinese, is so far behind the best in the field, GPT 5.5, that it is not even worth testing anything else.

Thanks for the translation, I did not express it very clearly. Anything that I try is so much worse.

Is GPT 5.5 the best in the field? I think Opus is still better despite Anthropic's recent stumbling.

I did not try much Opus recently as I had a Codex subscription and heard bad things, but Opus is super good too. Let's say compared to any of them.

Honestly I depends on the context which this performance matters. Mistral is quiet cheap

For certains tasks that are not hard but depend a clear specification, it's even better to haver less capable model because it forces you to do a better description of what you want, ending up with a better results. I will defend my PhD thesis soon and I will buy a yearly Mistral subscription at a student price to get it for cheap.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: