More

SamInTheShell · 2025-11-08T00:42:59 1762562579

Kinda wonder what the extent of this is. You can get some really great results from your employees by mandating shit like this. /s

I use AI daily and frankly I love it while thinking of it from the context of "I write some rough instructions and it can autocomplete an idea for me to an extremely great degree". AI literally types faster than me and is my new typewriter.

However, if I had to use it for every little thing, I'd do it. The problem though is when it reaches a point where I have to use it to replace critical thinking for something I really don't know yet.

The problem here is that these LLMs can and will churn out absolute trash. If this was done under mandate, the only thing I'd be able to respond with when that trash is being questioned is "the AI did it" and "idk, I was using AI like I was told".

It literally falls into the "above my pay-grade" category when it comes down as a mandate.

I really hope there's more nuance to articles like these though. I really hope these companies mandating AI use are doing so in a way that considers the limitations.

This article does not really clue me the reader in to if that is the case or not though.

SamInTheShell · 2025-11-05T19:38:16 1762371496

Went in thinking "Have you heard of Go?"... but this turned out to be about GPU computing.

nasretdinov · 2025-11-05T20:18:35 1762373915

Well, they said "good" :). Go we already have, that is correct.

P.S. I'm joking, I do love Go, even though it's by no means a perfect language to write parallel applications with

SamInTheShell · 2025-11-08T00:58:12 1762563492

Haha, yeah. Idk any other language where I practically get a free parallelization+concurrency sandwich. It's kept me coming back to Go for a decade now, despite them using a signal that prevents using it for system level libraries. They literally broke my libnss-go package years ago when they selected the signal to use to control the concurrency portion of the runtime.

pdimitar · 2025-11-08T03:38:46 1762573126

Elixir.

Parallelism is trivial and front-and-center.

And no it's not a niche language. Don't listen to the army of Python technicians.

SamInTheShell · 2025-11-05T19:15:36 1762370136

So what do you do when the context engineer rolls up into chat and jailbreaks the system prompt that was designed? Do you just hire them on the spot? This literally isn’t what LLMs are designed for nor are they good at.

SamInTheShell · 2025-11-04T18:52:42 1762282362

Right there with you on this. These car companies would just be better served ripping out their telemetry stacks and just providing a screen that only supports a phone driving it. I pulled the jumper for the telemetry stacks in my last two cars and I’m going to do so in my next car. They don’t get to sell me a car that then tracks and reports my habits to my insurance without paying me to do so on an opt in only basis.

devilbunny · 2025-11-04T22:19:38 1762294778

I'm entirely on board with this - and I'm driving a 2001-model car at least partly because of it (my other goal is that my state has antique tags that you buy once and never pay for again - when the car is 25 years old). But what can you buy in the US that doesn't cripple the car (well, at least the entertainment system) in the process?

SamInTheShell · 2025-11-01T21:44:48 1762033488

Smells like vaporware or a scam. I'll be surprised if this churns out anything that actually is secure and private.

If you're not running it on your own hardware and completely offline it's not private. I can prove that.

SamInTheShell · 2025-11-01T21:29:32 1762032572

Currently today, I would say these models can be used by someone with minimal knowledge to churn out SPAs with React. They can probably get pretty far into making logins, message systems, and so on because there is lots of training data for those things. They can struggle through building desktop apps as well with relative ease compared to how I had to learn in years long past.

What these LLMs continue to prove those is they are no substitute for real domain knowledge. To date, I've yet to have a model implement RAFT consensus correctly in testing to see if they can build a database.

The way I interact with these models is almost adversarial in nature. I prompt them with the bare minimum that a developer might get in a feature request. I may even have a planning session to populate the context before I set it off on a task.

The bias in these LLMs really shines through an proves their autocomplete properties when they have a strong bias towards changing the one snippet of code I wrote because it doesn't fit in how it's training data would suggest the shape of it's code should be. Most models will course correct with instructions that they are wrong and I am right though.

One thing I've noted is that if you let it generate choices for you from the start of a project, it will make poor choices in nearly every language. You can be using uv to manage a python project and it will continue to try using pip or python commands. You can start an electron app and it will continuously botch if it's using commonjs or some other standard. It persistently wants to download go modules before coding instead of just writing the code and doing `go mod tidy` after (it literally doesn't need the module in advance, it doesn't even have tools to probe the module before writing the code anyway).

RAFT consensus is my go-to test because there is no 1 size fits all way for you to implement it. It might get an in-memory key store system right, but what if you want it to organize etcd/raft/v3 in a way that you can do multi-group RAFT? What if you need RAFT to coordinate some other form of data replication? None of these LLMs can really do it without a lot of prep work.

This is across all the models available from OpenAI, Claude, and Google.

SamInTheShell · 2025-11-01T21:10:17 1762031417

> Gemini and it's tooling is absolute shit.

Which model were you using? In my experience Gemini 2.5 Pro is just as good as Claude Sonnet 4 and 4.5. It's literally what I use as a fallback to wrap something up if I hit the 5 hour limit on Claude and want to just push past some incomplete work.

I'm just going to throw this out there. I get good results from a truly trash model like gpt-oss-20b (quantized at 4bits). The reason I can literally use this model is because I know my shit and have spent time learning how much instruction each model I use needs.

Would be curious what you're actually having issues with if you're willing to share.

sega_sai · 2025-11-01T21:41:56 1762033316

I share the same opinion on Gemini cli. Other than for simplest tasks it is just not usable, it gets stuck in loops, ignores instructions, fails to edit files. Plus it just has a plenty of bugs in the cli that you occasionally hit. I wish I could use it rather than pay an extra subscription for Claude Code, but it is just in a different league (at least as recently as couple of weeks ago)

SamInTheShell · 2025-11-01T22:22:37 1762035757

Which model are you using though? When I run out of Gemini 2.5 Pro and it falls back to the Flash model, the Flash model is absolute trash for sure. I have to prompt it like I do local models. Gemini 2.5 Pro has shown me good results though. Nothing like "ignores instructions" has really occurred for me with the Pro model.

sega_sai · 2025-11-01T22:45:49 1762037149

I get that even with the 2.5 pro

SamInTheShell · 2025-11-01T23:27:04 1762039624

That's weird. I can prompt 2.5 Pro and Claude Sonnet 4.5 about the same for most typescript problems and they end up doing about the same. I get different results with Terraform though, I think Gemini 2.5 Pro does better on some Google Cloud stuff, but only on the specifics.

Is just strange to me that my experience seems to be a polar opposite of yours.

sega_sai · 2025-11-02T00:05:07 1762041907

I don't know. The last problem I tried was a complex one -- migration of some scientific code from CPU to GPU. Gemini was useless there, but Claude proposed realistic solutions and was able to explore and test those.

nl · 2025-11-01T23:25:15 1762039515

I think you must be using it quite differently to me.

I can one-shot new webapps in Claude and Codex and can't in Gemini Pro.

SamInTheShell · 2025-11-02T00:02:03 1762041723

The type of stuff I tend to do is much more complex than a simple website. I really can't rely on AI as heavily for stuff that I really enjoy tinkering with. There's just not enough data for them to train on to truly solve distributed system problems.

SamInTheShell · 2025-10-26T17:51:48 1761501108

Code losing value was my first take away when trialing out these coder agents. I think one of the skills I leverage more now than I have had to in the past is "How can I break this" thinking. The ability to validate the results is more valuable than being able to produce code now imo.

SamInTheShell · 2025-10-23T15:15:11 1761232511

No. You may not even talk around the topics. This explicitly comes with the training. Anyone asks me about my last job, I don’t really state much more than what the job title was and what the job postings said for the role I applied to.

CoastalCoder · 2025-10-23T15:39:23 1761233963

I suspect your training went beyond what the law requires, but perhaps I'm mistaken.

SamInTheShell · 2025-10-20T18:02:26 1760983346

This makes me feel more justified in using Posting in my terminal these days.