And they mentioned at the end of the presentation that they're already planning their next datacenter, which will require 5x the power. Not sure if that means equivalent to ~1,000,000 of the current GPU's, or more because next-gen Nvidia chips are more efficient.
I keep hearing about Claude's impressive coding skills (compared to its benches) yet, not evident for me (I use the web version, not cline). Compared to 4o it's not that great.
My pet theory is that Sonnet was trained really cleverly on a lot code that resembles real world cases.
In our small and humble internal evals it regularly beats any other frontier models on some tasks. The shape of capability is really not intuitive/1 dimensional
What are you using it for in general? IME the reason Claude pulls out ahead is that when you use it in a larger existing codebase, it keeps everything "in the style" of that codebase and doesn't veer off into weird territory like all the others.
My experience as well. Working in Scala primarily, it tends to be very good at following the constructs of the project.
Using a specific Monad-transformer regularly? It'll use that pattern, and often very well, handling all the wrapping and unwrapping needed to move data types about (at least well enough that the odd case it misses some wrapping/unwrapping is easy to spot and manage).
A custom GPT or GEM with the same source files, and those models regularly fail to maintain style and context, often suggesting solutions that might be fine in isolation but make little sense in the context of a larger codebase. It's almost like they never reliably refer to the code included in the project/GPT/GEM.
Claude on the other hand is so consistent about referring to existing artifacts that, as you approach the limit of project size (which is admittedly small) you can use up your entire 5-hour block of credits with just a few back-and-forths.
Lol no company is making money using 4o, however thanks to claude sonnet programms like Cursor are usable lol. 4o agents suck, just try it instead of talking
I can honestly tell you from my experience that Sonnet 3.5s coding skills did things no other models did right last year during the summer, this was even though the benchmarks showed that it wasn't the best performing at coding tasks.
I prototyped on the weekend and started out with 4o because i had a subscription running.
After an hour and a half assed working result, i put everything into claude and it made it significant better on the first try and i had not a subscription active with claude.
Really interesting, I used it today still lots of issues. Maybe my python notebook is not approach is too complicated for Sonnet? Couldn't be able to fix a custom complex seaborn plot. 4o failed too. o3-mini-high managed to do it really well on the other hand.
There is honestly no rhyme of reason to all these opinions, someone was telling me the other day that Claude is for sure the best, I'd say multiple people actually.
I find it concerning there is no real accurate benchmarks for this stuff that we can all agree on.
Anthropic best model is Sonnet 3.5 in my opinion. The reason its good is it is very effective for the price and fast. (I do think Google has caught up a lot in this regard). However, not having COT makes its results worse than similarly cheap COT based models.
Leaderboards don't care about cost. Leaderboards largely rank a combination of accuracy + speed. Anthropic has fell behind Google in accuracy + speed (again missing COT), and frankly behind Google in raw speed.
No idea why was this downvoted, but you are correct.
Seems like the team at xAI caught up very quickly to OpenAI to be at the top of the leaderboard in one of the benchmarks and also caught up with features with Grok 3.
Giving credit where credit is due, even though this is a race to zero.
Yeah, so many people aren't capable of talking about anything Musk-adjacent with clear thoughts. It's insane how quickly xAI went from not existing, to the top of the benchmarks.
Depends what you mean by "people here". I mean, obviously the majority of HN commentators and even the majority of commentators on this thread seem to be. But there will always be a couple of slightly unhinged folk in a big enough group of readers.
I'm not sure what you mean here? Musk has a history of doing both incredibly useful and cool things, and also incredibly dumb, cruel, and for some people even terrible things. That context should be part of any clear thinking around him. He does not get a clean slate in every new discussion of him.
There are widespread, legitimate concerns about what kind of person Elon Musk is turning out to be. There is a lot of chatter about fears of China's AI rise, but what happens if we get Elon's brand of cruelty and lack of empathy in an authoritarian superintelligent AI ? Is that the AI future we want? Can you imagine an SAI with real power that interacts with people like Elon does on Twitter? I am not sure that is a future I want to live in.
reply