I was just messing around with LLMs all day, so had a few test cases open. Asked...

1oooqooq · 2024-11-10T12:42:37 1731242557

> and renamed 1 variable that had nothing to do with

irrefutable proof we have AGI. it's here. they are as sentient as any human in my code reviews

manmal · 2024-11-10T10:11:25 1731233485

Have you also tried Qwen-2.5-coder and deepseek-coder-v2 on the problem? I‘d be very curious whether they do any better.

ikt · 2024-11-10T13:42:35 1731246155

Are you able to test on https://chat.mistral.ai/chat as well? With large2 and coedestral?

I'm interested !

bjt12345 · 2024-11-10T09:45:07 1731231907

How did Claude Sonnet 3.5 fair?

DeathArrow · 2024-11-10T11:07:32 1731236852

It would be interesting to also compare with Claude 3.5 and Deepseek 2.5

dgalt · 2024-11-12T05:55:53 1731390953

Here is quite comprehensive llm for coding leaderboard: https://aider.chat/docs/leaderboards/ And they update it quite quickly with new models releases.