It doesn't seem to me that you're familiar with my work - you seem to be mixing me in with the vast ocean of uncritical LLM boosting content that's out there.
I'm thinking of the system you built to watch videos and parse JSON and the claims of that having a general suitability, which is simply dishonest imo. You seem to be confusing me with someone that hasn't been asking you repeatedly to address these kinds of concerns and the above series are a kind of potemkin set of things that don't intersect with your other work.
To my surprise, on re-reading that post I didn't mention that you need to double-check everything it does. I guess I forgot to mention that at the time because I thought it was so obvious - anyone who's paying attention to LLMS should already know that you can't trust them to reliably extract this kind of information.
I've mentioned that a lot in my other writing. I frequently tell people that the tricky thing about working with LLMs is learning how to make use of a technology that is inherently unreliable.
Second update: I just noticed that I DID say "You should never trust these things not to make mistakes, so I re-watched the 35 second video and manually checked the numbers. It got everything right." in that post already.
> You seem to be confusing me with someone that hasn't been asking you repeatedly to address these kinds of concerns
It's like criticizing a "Hello World" program for not having proper error handling and security protocols. While those are important for production systems, they're not the point of a demonstration or learning example.
Your response seems to take these examples and hold them to the standard of mission-critical systems, which is a form of technical gatekeeping - raising the bar unnecessarily high for what counts as a "valid" technical demonstration.
Here's my series about misconceptions: https://simonwillison.net/series/llm-misconceptions/
It doesn't seem to me that you're familiar with my work - you seem to be mixing me in with the vast ocean of uncritical LLM boosting content that's out there.