GLM-4.7 in opencode is the only opensource one that comes close in my experience and probably they did use some Claude data as I see the occasional You’re absolutely right in there
I got their z.ai plan to test alongside my Claude subscription; it feels about on par with something between sonnet 4.0 and sonnet 4.5. It's definitely a few steps below current day Claude, but it's very capable.
dumbness usually comes from lack of information, humans are the same way - the difference between other llms is that if opus has information it has a ridiculously high accuracy on tasks.
z.ai (Zhipu AI) is a chinese run entity, so presumably China's National Intelligence Law put in place in 2018, which requires data exfiltration back to the government, would apply to the use of this. I wouldn't feel comfortable using any service that has that fundamental requirement.
Google, OpenAI, Anthropic and Y Combinator are US run entities, so presumably the CLOUD Act and FISA require data exfiltration back to the government when asked, on top of the all the "Room 641A"s where the NSA directly taps into the ISP interconnects, would apply to the use of them. I wouldn't feel comfortable using any service that has that fundamental requirement.
I wouldn't use any provider: z.ai, Claude, OpenAI, ... if I was concerned about the government obtaining my prompts. If you're doing something where this is a legitimate concern (as opposed to my open source stuff), you should get a local LLM or put a lot of effort into anonymizing yourself and your prompts.
I agree completely, I meant in terms of opensource ones only.
Opus 4.5 is the current SOTA and using it in Claude Code is an absolute amazing experience.
But, paying 0 to test GLM-4.7 with opencode, feels like an amazing deal! I don’t use it for work though. But to keep “gaining experience” with these agents and tools, it’s by far the best option out there from all I’ve tried.
Claude spits that very regularly at the end of the answer, when it's clearly out of it's depth, and wants to steer discussion away from that blind-spot.
Perhaps being more intentional about adding a use case to your original prompts would make sense if you see that failure mode frequently? (Practicing treating LLM failures as prompting errors tends to give the best results, even if you feel the LLM "should" have worked with the original prompt).