Hacker Newsnew | past | comments | ask | show | jobs | submit | tipsytoad's commentslogin

Throughput is a metric for the total number of tokens/sec for all users in the system. Latency (ITL,TTFT) are individual user metrics.

The guys previous post was how rabbit r1 is revolutionising the smartphone. So I would take this post with a grain (heap?) of salt

Made it to the 5th stage of grief

While I large agree, when I rely too much on agentic llm usage I come away feeling that I haven’t really learnt much over the session, and the code wasn’t really “mine”. It’s also easy to let your skills atrophy over time if you’re not careful, and for the hardest / interesting problems I often turn the llm off entirely and write out the code by hand, and come out a lot happier than just guiding Claude

Someone not familiar with the field rediscovering the stochastic parrot argument from 3+ years ago


Clearly not from the UK. By US standards Labour would be socialist, and conservative (right) liberal at best.


It’s a quite deceptive paper. The main headline benchmarks (math500, aime24 /25) final answer is just a number from 0-1000, so what is the takeaway supposed to be for pass@k of 512/1024?

On the unstructured outputs, where you can’t just ratchet up the pass@k until it’s almost random, it switches the base model out for instruct, and in the worse case on livecodebench it uses a qwen-r1-distill as a _base_ model(!?) that’s an instruct model further fine tuned on R1’s reasoning traces. I assume that was because no matter how high the pass@k, a base model won’t output correct python.


I get the same feeling that I'm "not being productive" while playing video games, watching tv, etc that seems to kill any enjoyment from doing these things.

For me learning piano has been a great alternative to programming in the off hours (typing is quite transferrable too!). Highly recommend if you're like me on screens all day.


Like, PyTorch? And the new Mac minis have 512gb of unified memory


I usually am a huge fan of “copilot” tools (I use cursor, etc) and Claude has always been my go to.

But Sonnet 3.7 actually seems dangerous to me, it seems it’s been RL’d _way_ too hard into producing code that won’t crash — to the point where it will go completely against the instructions to sneak in workarounds (e.g. returning random data when a function fails!). Claude Code just makes this even worse by giving very little oversight when it makes these “errors”


this is a huge issue for me as well. It just kind of obfuscates errors and masks the original intent, rather than diagnosing and fixing the issue. 3.5 seemed to be more clear about what it's doing and when things broke at least it didn't seem to be trying to hide anything.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: