This was a couple of years ago, but I remember using ChatGPT to try and study for a certification by generating quiz questions.
It would always start to make every correct answer option "C" over time, no matter what I tried. Eventually I was so focused on whether or not it was stuck in a "C" loop that I started overthinking all of the questions and wasting time.
Flash forward to testing Sonnet 4.6 recently to try and see if it could effectively teach me something new, I got about 5 prompts in before I had to point out an oversight, and it gave me the classic "you're absolutely right, ignore that suggestion".
This is anecdotal of course, but at least LLMs are helping to build my skills of fact verification and citation checking!
Strangely enough, my first test with Sonnet 4.6 via the API for a relatively simple request was more expensive ($0.11) than my average request to Opus 4.6 (~$0.07), because it used way more tokens than what I would consider necessary for the prompt.
This is an interesting trend with recent models. The smarter ones get away with a lot less thinking tokens, partially to fully negating the speed/price advantage of the smaller models.
> Tech people are always talking about dinner reservations . . . We're worried about the price of lunch, meanwhile tech people are building things that tell you the price of lunch. This is why real problems don't get solved.
i think thats conflating two things (am not an expert). opencode exploited unauthorized use/api access, but obviously whatever that is using claude code sdk is kosher because its literally anthropic's blessed way to do this
I found this skybrary article on cockpit automation really interesting, since the detail in aviation literature is so thorough and small topics like this get considered carefully.
reply