More

otabdeveloper4 · 2026-02-19T20:02:03 1771531323

That's expected for any new "low-code" solution du jour.

otabdeveloper4 · 2026-02-19T19:59:47 1771531187

There won't ever be newer training data.

The OG data came from sites like Stackoverflow. These sites will stop existing once LLMs become better and easier to use. Game over.

esclerofilo · 2026-02-19T20:08:43 1771531723

Every time claude code runs tests or builds after a change, it's collecting training data.

co_king_5 · 2026-02-19T20:10:51 1771531851

Has Anthropic been able to leverage this training data successfully?

esclerofilo · 2026-02-19T20:23:02 1771532582

I can't pretend to know how things work internally, but I would expect it to be involved in model updates.

otabdeveloper4 · 2026-02-19T20:32:55 1771533175

You need human language programming-related questions to train on too, not just the code.

8note · 2026-02-19T20:39:13 1771533553

thats what the related chats are for?

otabdeveloper4 · 2026-02-19T19:56:56 1771531016

> let's have LLMs check our code for correctness

Lmao. Rofl even.

(Testing is the one thing you would never outsource to AI.)

idle_zealot · 2026-02-19T20:04:48 1771531488

Outsourcing testing to AI makes perfect sense if you assume that tests exist out of an obligation to meet some code coverage requirements, rather than to ensure correctness. Often I'll write a module and a few tests that cover its functionality, only for CI to complain that line coverage has decreased and reject my merge! AI to the rescue! A perfect job for a bullshit generator.

8note · 2026-02-19T20:37:51 1771533471

outsourcing testing the AI also gets its code to be connected to deterministic results, and show let the agent interact with the code to speculate expectations and check them against the actual code.

it could still speculate wrong things, but it wont speculate that the code is supposed to crash on the first line of code

sshine · 2026-02-19T20:41:41 1771533701

> Testing is the one thing you would never outsource to AI

That's not really true.

Making the AI write the code, the test, and the review of itself within the same session is YOLO.

There's a ton of scaffolding in testing that can be easily automated.

When I ask the AI to test, I typically provide a lot of equivalence classes.

And the AI still surprises me with finding more.

On the other hand, it's equally excellent at saying "it tested", and when you look at the tests, they can be extremely shallow. Or they can be fairly many unit tests of certain parts of the code, but when you run the whole program, it just breaks.

The most valuable testing when programming with AI (generated by AI, or otherwise) are near-realistic integration tests. That's true for human programmers, but we take for granted that casual use of the program we make as we develop it constitutes as a poor man's test. When people who generally don't write tests start using AI, there's just nothing but fingers crossed.

I'd rather say: If there's one thing you would never outsource to AI, it's final QA.

ben_w · 2026-02-19T20:31:45 1771533105

> (Testing is the one thing you would never outsource to AI.)

I would rephrase that as "all LLMs, no matter how many you use, are only as good as one single pair of eyes".

If you're a one-person team and have no capital to spend on a proper test team, set the AI at it. If you're a megacorp with 10k full time QA testers, the AI probably isn't going to catch anything novel that the rest of them didn't, but it's cheap enough you can have it work through everything to make sure you have, actually, worked through everything.

LoganDark · 2026-02-19T20:09:21 1771531761

You don't use the LLM to check your code for correctness; you use the LLM to generate tests to exercise code paths, and verify that they do exercise those code paths.

onion2k · 2026-02-19T20:30:45 1771533045

And that test will check the code paths are run.

That doesn't tell you that the code is correct. It tells you that the branching code can reach all the branches. That isn't very useful.

otabdeveloper4 · 2026-02-19T17:30:46 1771522246

otabdeveloper4 · 2026-02-19T14:13:26 1771510406

> that was supposed to only respond with JSON data.

You need to constrain token sampling with grammars if you actually want to do this.

written-beyond · 2026-02-19T14:17:14 1771510634

That reduces the quality of the response though.

debugnik · 2026-02-19T14:45:29 1771512329

As opposed to emitting non-JSON tokens and having to throw away the answer?

written-beyond · 2026-02-19T15:31:38 1771515098

Don't shoot the messenger

jgalt212 · 2026-02-19T15:21:05 1771514465

Or just run json.dumps on the correct answer in the wrong format.

Der_Einzige · 2026-02-19T16:09:34 1771517374

THIS IS LIES: https://blog.dottxt.ai/say-what-you-mean.html

I will die on this hill and I have a bunch of other Arxiv links from better peer reviewed sources than yours to back my claim up (i.e. NeurIPS caliber papers with more citations than yours claiming it does harm the outputs)

Any actual impact of structured/constrained generation on the outputs is a SAMPLER problem, and you can fix what little impact may exist with things like https://arxiv.org/abs/2410.01103

Decoding is intentionally nerfed/kept to top_k/top_p by model providers because of a conspiracy against high temperature sampling: https://gist.github.com/Hellisotherpeople/71ba712f9f899adcb0...

iugtmkbdfil834 · 2026-02-19T19:33:49 1771529629

I honestly would like to hope people were more up in arms over this, but.. based on historical human tendencies, convenience will win here.

otabdeveloper4 · 2026-02-19T20:14:03 1771532043

I use LLMs for Actual Work (boring shit).

I always set temperature to literally zero and don't sample.

otabdeveloper4 · 2026-02-19T14:12:20 1771510340

The sources Gemini cites are usually something completely unrelated to its response. (Not like you're gonna go check anyways.)

otabdeveloper4 · 2026-02-19T14:10:42 1771510242

> What the author seems to be saying is that the system prompt can be used to instill bias in LLMs.

That's, like, the whole point of system prompts. "Bias" is how they do what they do.

otabdeveloper4 · 2026-02-19T08:26:13 1771489573

> We are only just beginning to get the most out of the internet

"The Internet" is completely dead. Both as an idea and as a practical implementation.

No, Google/Meta/Netflix is not the "world wide web", they're a new iteration of AOL and CompuServe.

otabdeveloper4 · 2026-02-18T06:15:43 1771395343

LLMs haven't been improving for years.

Despite all the productizing and the benchmark gaming, fundamentally all we got is some low-hanging performance improvements (MoE and such).

otabdeveloper4 · 2026-02-16T19:59:09 1771271949

> finding the right spec and feeding it to the bots

Also known as "compiling source code".