More

naasking · 2026-05-06T12:55:18 1778072118

> It's not uncommon to see a gemma vs qwen comparison, where qwen does a bit better, but spent 22 minutes on the task, while gemma aligned the buttons wrong, but only spent 4 minutes on the same prompt.

Yes, Gemma 4 is very promising for its strong performance and token efficiency, but it's unfortunate that it's sliding window attention has a fatal flaw that makes me seriously hesitate to rely on it. See the series of videos on this channel:

https://youtu.be/ONQcX9s6_co?si=Yt55_N4DcNLstnGS

On top of Qwen3.5/3.6's superior recall, it's attention mechanism dramatically reduces KV cache requirements, so you can fit longer sessions in the same VRAM (or more concurrent sessions if you have agents running), which is critical for local hosting.

At this point Qwen3.6 with thinking mode disabled seems like the best balance.

naasking · 2026-05-05T12:04:07 1777982647

> It would have been much better if the bun team joined forces and helped out

Submitting patches is joining forces and helping out.

dspillett · 2026-05-05T14:49:15 1777992555

Submitting patches that are correct and match the project's desired standards¹ is joining forces and helping out.

--------

[1] And align with the project's direction. This part is of course much more subjective so could very easily be an honest misunderstanding of the situation.

naasking · 2026-05-01T12:41:44 1777639304

Would you feel good about completely fake CSAM if it actually reduced incidence of child molestation?

naasking · 2026-04-27T14:37:55 1777300675

If you don't understand and verify the scope of authorities a bearer token grants, then you are just begging for a security breach.

naasking · 2026-04-27T14:03:59 1777298639

> We had no idea — and Railway's token-creation flow gave us no warning — that the same token had blanket authority across the entire Railway GraphQL API, including destructive operations like volumeDelete.

So you effectively gave a junior dev a token with the authority to destroy your database, and then complained that the junior dev actually did so by accident while trying to solve some problems it had?

Obviously the AI shouldn't just search everywhere for bearer tokens to try when it runs into a roadblock, but frankly most of the blame does not fall on the AI here IMO. Know what authorities your bearer tokens grant, and understand the consequences of where you store them.

naasking · 2026-04-21T13:40:23 1776778823

> In particular, Wren gives up dynamic object shapes, which enables copy-down inheritance and substantially simplifies (and hence accelerates) method lookup.

A general rule of thumb is that if you can assign an expression a static type, then you can compile it fairly efficiently. Complex dynamic languages obviously actively fight this in numerous ways, and so end up being difficult to optimize. Seems obvious in retrospect.

naasking · 2026-04-19T13:26:53 1776605213

Seriously, when you're conversing with a person would you prefer they start rambling on their own interpretation or would you prefer they ask you to clarify? The latter seems pretty natural and obvious.

Edit: That said, it's entirely possible that large and sophisticated LLMs can invent some pretty bizarre but technically possible interpretations, so maybe this is to curb that tendency.

eastbound · 2026-04-19T21:25:03 1776633903

—So what would theoretically happen if we flipped that big red switch?

—Claude Code: FLIPS THE SWITCH, does not answer the question.

Claude does that in React, constantly starting a wrong refactor. I’ve been using Claude for 4 weeks only, but for the last 10 days I’m getting anger issues at the new nerfing.

tobyhinloopen · 2026-04-19T21:32:58 1776634378

Yeah this happens to me all the time! I have a separate session for discussing and only apply edits in worktrees / subagents to clearly separate discuss from work and it still does it

ashdksnndck · 2026-04-20T07:24:09 1776669849

I sometimes prompt with leading questions where I actually want Claude to understand what I’m implying and go ahead and do it. That’s just part of my communication style. I suppose I’m the part of the distribution that ruins things for you.

embedding-shape · 2026-04-19T13:36:50 1776605810

> The latter seems pretty natural and obvious.

To me too, if something is ambigious or unclear when I'm getting something to do from someone, I need to ask them to clarify, anything else be borderline insane in my world.

But I know so many people whose approach is basically "Well, you didn't clearly state/say X so clearly that was up to me to interpret however I wanted, usually the easiest/shortest way for me", which is exactly how LLMs seem to take prompts with ambigiouity too, unless you strongly prompt them to not "reasonable attempt now without asking questions".

gausswho · 2026-04-19T13:34:58 1776605698

Socrates would agree: https://en.wikipedia.org/wiki/Socratic_method

gck1 · 2026-04-19T19:46:31 1776627991

I have a fun little agent in my tmux agent orchestration system - Socratic agent that has no access to codebase, can't read any files, can only send/receive messages to/from the controlling agent and can only ask questions.

When I task my primary agent with anything, it has to launch the Socratic agent, give it an overview of what are we working on, what our goals are and what it plans to do.

This works better than any thinking tokens for me so far. It usually gets the model to write almost perfectly balanced plan that is neither over, nor under engineered.

fragmede · 2026-04-19T23:41:31 1776642091

Sounds pretty neat! Is there an written agent.md for that you could share for that?

adw · 2026-04-19T22:30:39 1776637839

When you’re staffing work to a junior, though, often it’s the opposite.

majormajor · 2026-04-20T03:24:39 1776655479

IME "don't ask questions and just do a bunch of crap based on your first guess that we then have to correct later after you wasted a week" is one of the most common junior-engineer failure modes and a great way for someone to dead-end their progression.

PunchyHamster · 2026-04-19T22:40:16 1776638416

So you are saying they are trying for the whole Artificial Intern vibe ?

naasking · 2026-04-18T13:20:10 1776518410

Diminishing returns are inevitable, agreed, but it's not clear we're near that point yet.

naasking · 2026-04-18T12:55:32 1776516932

> What other uses do GPU's have that are critical...? lol

GPUs are essential to every kind of scientific and engineering simulation you can think of. AI-accelerated simulations are a huge deal now.

wr2 · 2026-04-18T14:35:17 1776522917

GPUs that have lives of..?

Now compare that with the life a rail road. Amusing.

naasking · 2026-04-18T14:58:36 1776524316

Some of those railroad bridges might never have been constructed without those simulations.

naasking · 2026-04-18T12:54:26 1776516866

> A bridge with a 6 year lifespan for each beam is insane.

Not necessarily. Depends entirely on the value of the transport that the bridge enables.