I've written extensively about myths and misconceptions about LLMs, much of whic...

th0ma5 · 2024-12-22T01:09:08 1734829748

I'm thinking of the system you built to watch videos and parse JSON and the claims of that having a general suitability, which is simply dishonest imo. You seem to be confusing me with someone that hasn't been asking you repeatedly to address these kinds of concerns and the above series are a kind of potemkin set of things that don't intersect with your other work.

simonw · 2024-12-22T23:37:02 1734910622

You mean this? https://simonwillison.net/2024/Oct/17/video-scraping/

To my surprise, on re-reading that post I didn't mention that you need to double-check everything it does. I guess I forgot to mention that at the time because I thought it was so obvious - anyone who's paying attention to LLMS should already know that you can't trust them to reliably extract this kind of information.

I've mentioned that a lot in my other writing. I frequently tell people that the tricky thing about working with LLMs is learning how to make use of a technology that is inherently unreliable.

Update: added a new note about reliability here: https://simonwillison.net/2024/Oct/17/video-scraping/#a-note...

Second update: I just noticed that I DID say "You should never trust these things not to make mistakes, so I re-watched the 35 second video and manually checked the numbers. It got everything right." in that post already.

> You seem to be confusing me with someone that hasn't been asking you repeatedly to address these kinds of concerns

Where did you do that?

kordlessagain · 2024-12-22T16:55:50 1734886550

> dishonest Potemkin

It's like criticizing a "Hello World" program for not having proper error handling and security protocols. While those are important for production systems, they're not the point of a demonstration or learning example.

Your response seems to take these examples and hold them to the standard of mission-critical systems, which is a form of technical gatekeeping - raising the bar unnecessarily high for what counts as a "valid" technical demonstration.