I am using both on OpenCode Go plan and they're pretty good, but I would say still not at the same level at GPT-5.5 in my experience, I don't know about Opus.
I'll finish a
myPhD in NLP this June. During my PhD, I built NLP systems that work under limited data and developed methods to evaluate what language models learn and how they behave.
Now I want a research engineer or applied scientist role working on LLM evaluation and efficient training. Available from July 1st.
I am not following this obsession with SOTA and benchmark rankings
I have been using DeepSeek and GLMnmodels with OpenCode and Codex and Claudr side by side.
I have not found the Chinese models lacking. I enjoy for coding and like to maintain full control of my codebade and deeply care about the GOF patterns. So I am very stringent in terms of what I want the LLM to code and how to code.
So from my perspective, they are all about the same.
That I agree with, but for more complex autonomous changes the differences are considerable. However, it seems that most models will reach the saturation time in which they will be useful for almost everything and the difference will be in more and more niche and specialized tasks.
GP is stating that the second best in the field, the Chinese, is so far behind the best in the field, GPT 5.5, that it is not even worth testing anything else.
For certains tasks that are not hard but depend a clear specification, it's even better to haver less capable model because it forces you to do a better description of what you want, ending up with a better results. I will defend my PhD thesis soon and I will buy a yearly Mistral subscription at a student price to get it for cheap.
reply