Hacker Newsnew | past | comments | ask | show | jobs | submit | manamorphic's commentslogin

Crazy how most people miss this simple logical deduction.


I heard conflicting things about it. Some claim it was trained so it can do well on benchmarks and in real world scenarios it's lacking. Can somebody deny/confirm ?


What else should you train for? If the benchmark dosn't represent real world scenarios, isn't that a problem with the benchmark rather than the model?


If your benchmark covers all possible programming tasks then you dont need an llm, you need search over your benchmark.

Hypothetically let's say the benchmark contains "test divisibility of this integer by n" for all n of the form 3x+1. An extremely overfit llm won't be able to code divisibility for all n not of the form 3x+1, and your benchmark will never tell.


No, because solving a well defined problem with well defined right or wrong is generally not what people use llm for. Most of the times my query to llm is underspecified, and lot of time I figure out the problem when chatting with LLM. And benchmark by definition only measures just right/wrong answer.


This is called Goodhart's law, who said: "Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes."

But in modern usage it is often rephrased to: "When a measure becomes a target, it ceases to be a good measure"

https://en.wikipedia.org/wiki/Goodhart%27s_law


Overfitting is a concern.



It's small... And for that size, it does very well. Been using it a few days and it's quite good for it's size and the fact you can run it locally. So not sure if it's true what you say; for us it works really well.


I tried the Qwen2.5 32B a couple weeks ago. It was amazing for a model that can run on my laptop but far from Claude/GPT-4o. I am downloading the coder tuned version now.


i tried qwen and it is surprinsingly good, maybe not as good as claude but could replaced it


In all fairness, as an EU citizen, if I was American I'd vote Trump. The Harris campaign was very weak and built on identity politics that as we saw is a double edged sword. And why should USA care about Europe? I mean, yes, unfortunate for Ukraine, but it's not necessarily their problem, no matter how much people make it out to be. We in Europe need to grow some balls and not be dependent on who is the next US president.


> unfortunate for Ukraine, but it's not necessarily their problem

It seems like it absolutely is the US's problem, albeit indirectly. If Russia gets the outcome it wants in Ukraine, they'll have access to rich mineral deposits, vast quantities of grain, and nuclear power, boosting their economy and their status as a rival world power to the US. It will signal to Putin that he can be aggressive towards other neighbouring countries with little pushback. The war has resulted in a growing alliance between Russia, Iran and North Korea which is altering global military power dynamics and not in the US's favour. Also, China is watching what's happening with eagle eyes to determine whether to invade Taiwan, which would definitely escalate the US's engagement.


ran it in a Windows Sandbox ... doesn't work. messes up the coordinates, can't click on anything


I'm experiencing the same on mac. It's claiming that it's clicking and doing stuff, but it's not. (yes I gave it the necessary permissions)


I wonder if it's expecting a default resolution (like for a Mac Book pro?). I'm seeing the same issue of the coordinates not working on Win11 for a 3840x2160 display.


Maybe it scales the image before recognition and forgets to scale back up the projected coordinates for actions?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: