I really don't think they're being over sold that much. I'm running llama 3 8b on my machine, and it feels a lot like running claude 3 haiku with a much lower context window. Quality wise it is surprisingly nice.
Yeah I just mentioned Llama to point out that the open weight models have been really catching up.
Microsoft is almost certainly using GPT-4 given their relationship with ClosedAI, but I would definitely not put GPT-4 (nor Turbo) "two massive steps up" from Claude 3 Opus. I have access to both through Kagi, and I have found myself favoring the responses of Claude to the point where I almost never use GPT(TM) anymore.
You're misreading in multiple ways, maybe in a rush to dunk on "Closed AI".
Github Copilot is not the same as Copilot Chat which uses GPT-4, there still some uncertainty on if Copilot completions use GPT-4 as outsiders know it (and iirc they've specifically said it doesn't at some point)
I also said Haiku is two massive steps behind Anthropic's offerings... which are Sonnet and Opus.
Anthropic isn't any more open than OpenAI, and I personally don't attribute any sort of virtue to any major corporation, so I'll take what works best
I... don't think I misread you? Maybe you didn't mean what you wrote, but what you said was:
> Github is likely using a GPT-4 class model which is two (massive) steps up in capabilities in Anthropic's offerings alone
Comparing GPT-4 to Anthropics offerings, which, as you say, includes Sonnet and Opus.
> Anthropic isn't any more open than OpenAI, [...] so I'll take what works best
I understand that, and same here. I don't prefer Claude for any reason other than the quality of its output. I just think OpenAIs name is goofy with how they actually behave, so I prefer the more accurate derivative of their name :)
Regarding what model Copilot Completions is using - point taken, I have no comment on that. My original comment in this thread was only meant to point out that open weight models are getting a lot better. Not saying they're using them.
I think benchmarks are severely overselling what open source models are capable of compared to closed source models.