You're confusing intuition of what works or not with being too close to your problem to make a judgement on the general applicability of your techniques.
They're saying it might work for you, but isn't generally applicable (because most people aren't going to develop this intuition, presumably).
Not sure I agree with that. I would say there are classes of problems where LLMs will generally help and a brief training course (1 week, say) would vastly improve the average (non-LLM-trained) engineer's ability to use it productively.
No it is more like thinking that my prescribed way of doing stuff must be the way things work in general because it works for me. But you give instructions specifically about everything you did but the person you give them to isn't your exact height or can't read your language, so you can easily assume now that they just don't get it. But with these LLMs you also get this bias hidden from you as you inch closer to the solution at every turn. The result seems "obvious" but the outcomes were never guaranteed and will most likely be different for someone else if one thing at point is different.
My whole thing about LLMs is that using them isn't "obvious". I've been banging that drum for over a year now - the single biggest misconception about LLMs is that they are easy to use and you don't need to put a serious amount of effort into learning how to best apply them.
It's to me more that the effort you put in is not a net gain. You don't advance a way of working with them that way because of myriad reasons including things from ownership of the models, fundamentals of the resulting total probabilistic space of the interaction, to just simple randomness even at low temperatures. The "learning how to best apply them" is not definable because who is learning what to apply what to what... The most succint way I know how to describe these issues is that like startup success you're saying "these are lotto number that worked for me" in many of the assumptions you make about the projects you present.
In real, traditional, deterministic systems where you explicitly design a feature, even that has difficulty being coherent over time as usage grows. Think of tab stops on a typewriting evolving from an improvised template, to metal tabs installed above the keyboard, to someone cutting and pasting incorrectly and reflowing a 200 page document to 212 pages accidentally because of tab characters...
If you create a system with these models that writes the code to process a bunch of documents in some way or so some kind of herculean automation you haven't improved the situation when it comes to clarity or simplicity, even if the task at hand finishes sooner for you in this moment.
Every token generated has an equal potential to spiral out into new complexities and whack a mole issues that tie you to assumptions about the system design while providing this veneer that you have control over the intersections of these issues, but as this situation grows you create an ever bigger problem space.
And I definitely hear you say, this is the point where you use sort of full stack interoception holistic intuition about how to persuade the system towards a higher order concept of the system and expand your ideas about how the problem could be solved and let the model guide you ... And that is precisely the mysticism I object to because it isn't actually a kind productiveness, but a struggle, a constant guessing, and any insight from this can be taken away, changed accidentally, censored, or packaged as a front run against your control.
Additionally the nature of not having separate in band and out of band streams of data means that even with agents and reasoning and all of the avenues of exploration and improving performance will still not escape the fundamental question of ... What is the total information contained in the entire probabilistic space. If you try to do out of band control in some way like the latest thing I just read where they have a separate censoring layer, you just either wind up having to use another LLM layer there which still contains all of these issues, or you use some kind of non transformer method like Bayesian filtering or something and you get all of the issues outlined in the seminal spam.txt document...
So, given all of this, I think it is really neat the kinds of feats you demonstrate, but I object that these issues can be boiled down to "putting a serious amount of effort into learning how to best apply them" because I just don't think that's a coherent view of the total problem, and not actually something that is achievable like learning in other subjects like math or something. I know it isn't answerable but for me a guiding question remains why do I have to work at all, is the model too small to know what I want or mean without any effort? The pushback against prompt engineering and the rise of agentic stuff and reasoning all seems to essentially be saying that, but it too has hit diminishing returns.