> Interesting, so someone submitting a paper for review could also submit one with hidden instructions for LLMs to summarise or review it in a very positive light.
I may or may not know a guy who added several hidden sentences in Finnish to his CV that might have helped him in landing an interview.
My understanding is that something among those lines happened:
> All Policy A (no LLMs) reviews that were detected to be LLM generated were removed from the system. If more than half of the reviews submitted by a Policy A reviewer were detected to be LLM generated, then all of their reviews were deleted, and the reviewer themselves was removed from the reviewer pool.
Half is a bit lenient in my view, but I suppose they wanted to avoid even a single false positive.
I for one am renting a desk at an office. I have all the usual office amenities and an environment in which I can focus properly, but I don't have to involve myself geographically with the company I work for.
> I've seen Claude go and lazily fix a test by loosening invariants.
He does pull a sneaky on you from time to time, even nowadays, in v4.6, doesn't he?
To me it's analogous to the current situation at the strait of Hormuz - it's an enormous crisis but since almost everyone has a buffer of oil stockpiles, we can pretend it's not there.
A huge part of such stuff is deliberately hidden to avoid getting the government too involved in day to day lives.
Case in point: for a while we had an arrangement with our neighbour that we'll pick up their child from preschool and stay with her until her parents get home and in exchange they would prepare dinner for us.
No money exchanged hands, so no GDP generated, yet everyone's quality of life improved.
I guess a lot of the 'free market' stuff is also about avoiding too much government involvement. It tends to be a pain the neck when you have to fill tax returns and apply for permits.
-Just pasting the error and askig what's going on here.
-"How do I X in Y considering Z?"
-Single-use scripts.
-Tab (most of the time), although that doesn't seem to be Claude.
What doesn't:
-Asking it to actually code. It's not going to do the whole thing and even if, it will take shortcuts, occasionally removing legitimate parts of the application.
-Tests. Obvious cases it can handle, but once you reach a certain threshold of coverage, it starts producing nonsense.
Overall, it's amazing at pattern matching, but doesn't actually understand what it's doing. I had a coworker like this - same vibe.
Opus 4.5 max (1m tokens) and above were the tipping point for me, before that, I agree with 100% of what you said.
But even with Opus 4.6 max / GPT 5.4 high it takes time, you need to provide the right context, add skills / subagents, include tribal knowledge, have a clear workflow, just like you onboard a new developer. But once you get there, you can definitely get it to do larger and larger tasks, and you definitely get (at least the illusion) that it "understands" that it's doing.
It's not perfect, but definitely can code entire features, that pass rigorous code review (by more than one human + security scanners + several AI code reviewers that review every single line and ensure the author also understands what they wrote)
I may or may not know a guy who added several hidden sentences in Finnish to his CV that might have helped him in landing an interview.
reply