> I don't think there's evidence that this issue would persist after continuing to scale models to be larger and doing more RL
And how much larger do we need to make the models? 2x? 3x? 10x? 100x? How large do they need to get before scaling-up somehow solves everything?
Because: 2x larger, means 2x more memory and compute required. Double the cost or half the capacity. Would people still pay for this tech if it doubles in price? Bear in mind, much of it is already running at a loss even now.
And what if 2x isn't good enough? Would anyone pay for a 10x larger model? Can we even realistically run such models as anything other than a very expensive PoC and for a very short time? And whos to say that even 10x will finally solve things? What if we need 40x? Or 100x?
Oh, and of course: Larger models also require more data to train them on. And while the Internet is huge, it's still finite. And when things grow geometrically, even `sizeof(internet)` eventually runs out ... and, in fact, may have done so already [1] [2]
What if we actually discover that scaling up doesn't even work at all, because of diminishing returns? Oh wait, looks like we did that already: [3]
Scaling applies to multiple dimensions simultaneously over time. A frontier model today could be replicated a year later with a model half the size, with a quarter of the FLOPS, etc. I don’t know the real numbers for optimization scaling, but you could check out NanoGPT speedrun [1] as an example.
The best solution in the meantime is giving the LLM a harness that allows tool use like what coding agents have. I suspect current models are fully capable of solving arbitrary complexity artificial reasoning problems here, provided that they’re used in the context of a coding agent tool.
Some problems are just too complex and the effort to solve them increases exponentially. No LLM can keep up with exponenentially increasing effort unless you run them for adequatte number of years.
What? Fundamentally, information can only be so dense. Current models may be inefficient w.r.t. information density, however, there is a lower bound of compute required. As a pathological example, we shouldn't expect a megabyte worth of parameters to be able to encode the entirety of Wikipedia.
> They are good at repeating their training data, not thinking about it.
Which shouldn't come as a surprise, considering that this is, at the core of things, what language models do: Generate sequences that are statistically likely according to their training data.
This is too large of an oversimplification of how an LLM works. I hope the meme that they are just next token predictors dies out soon, before it becomes a permanent fixture of incorrect but often stated “common sense”. They’re not Markov chains.
Sure, but a complex predictor is still a predictor. It would be a BAD predictor if everything it output was not based on "what would the training data say?".
If you ask it to innovate and come up with something not in it's training data, what do you think it will do .... it'll "look at" it's training data and regurgitate (predict) something labelled as innovative
You can put a reasoning cap on a predictor, but it's still a predictor.
They are both designed, trained, and evaluated by how well they can predict the next token. It's literally what they do. "Reasoning" models just buildup additional context of next token predictions and RL is used to bias output options to ones more appealing to human judges. It's not a meme. It's an accurate description of their fundamental computational nature.
Yes. That's not the devastating take-down you think it is. Are you positing that people have souls? If not, then yes: human chain-of-thought is the equivalent of next token prediction.
The problem is in adding the word "just" for no reason.
It makes the statement of a fact a type of rhetorical device.
It is the difference between saying "I am a biological entity" and "I am just a biological entity". There are all kinds of connotations that come along for the ride with the latter statement.
Then there is the counter with the romantic statement that "I am not just a biological entity".
> We really need something that could store data for 80 years minimum.
We have that. We know how to put digital data onto paper, at high density. Not high compared to actual drives or even optic disks of course, but still enough that we could put all importan data that a person produces throughout a lifetime into a large box of A4 sheets, which would still be legible after many decades. All that's needed is an agreement on a clever collection of formats for text and images, maybe even video, formats that are well documented (ideally the documentation is stored alongside the data).
The problem is not that we don't have the tech to do such things, the problem is
a) In our current world, the only things that seem to get huge amounts of resources are those that make some shareholders happy
b) Most of the data humanity produces these days, is useless noise, and the only reason anyone collects it, is to make a quick buck. And generative AI has made this trend a lot worse.
I would assume so. I expect there to be a lot of job postings looking for more "sexy" technologies to create the visage that those companies are growing and planning towards the future. And conversely I wouldn't expect any job postings of old "streets behind" technologies like COBOL to be fake, as they wouldn't help with such signalling.
It’s not 2x the price, I paid under $900 for a brand new one.
Battery is $60. How much does a MacBook battery cost? How long does a MacBook battery take to repair and how much skill do replace need to replace it? How do you upgrade the storage capacity on a MacBook?
Worse how? RAM, SSD and main board can be upgraded as an when needed, which is the point.
I like Framework's aesthetics more than MacBook already, and like the little customisablity (i.e bezel, mismatched coloured parts etc). I can accept a lower quality screen (compared to MacBook), speakers and camera no problem.
I'm willing to pay higher than MacBook price for the above package due to superiority of Linux over MacOs and supporting this model in general. However, I draw a line in the sand at battery life, so Mac it is for me for the foreseeable future.
> We don't just keep adding more words to our context window, because it would drive us mad.
That, and we also don't only focus on the textual description of a problem when we encounter a problem. We don't see the debugger output and go "how do I make this bad output go away?!?". Oh, I am getting an authentication error. Well, meaybe I should just delete the token check for that code path...problem solved?!
No. Problem very much not-solved. In fact, problem very much very bigger big problem now, and [Grug][1] find himself reaching for club again.
Software engineers are able to step back, think about the whole thing, and determine the root cause of a problem. I am getting an auth error...ok, what happens when the token is verified...oh, look, the problem is not the authentication at all...in fact there is no error! The test was simply bad and tried to call a higher privilege function as a lower privilege user. So, test needs to be fixed. And also, even though it isn't per-se an error, the response for that function should maybe differentiate between "401 because you didn't authenticate" and "401 because your privileges are too low".
Programmers are mostly translating business rules to the very formal process execution of the computer world. And you need to both knows what the rules means and how the computer works (or at least how the abstracted version you’re working with works). The translation is messy at first, which is why you need to revise it again and again. Especially when later rules comes challenging all the assumptions you’ve made or even contradicting themselves.
Even translations between human languages (which allows for ambiguity) can be messy. Imagine if the target language is for a system that will exactly do as told unless someone has qualified those actions as bad.
Good programmers working hand in glove with good companies do much more than this. We question the business logic itself and suggest non-technical, operational solutions to user issues before we take a hammer to the code.
Also, as someone else said, consider the root causes of an issue, whether those are in code logic or business ops or some intersection between the two.
When I save twenty hours of a client's money and my own time, by telling them that a new software feature they want would be unnecessary if they changed the order of questions their employees ask on the phone, I've done my job well.
By the same token, if I'm bored and find weird stuff in the database indicating employees tried to perform the same action twice or something, that is something that can be solved with more backstops and/or a better UI.
Coding business logic is not a one-way street. Understanding the root causes and context of issues in the code itself is very hard and requires you to have a mental model of both domains. Going further and actually requesting changes to the business logic which would help clean up the code requires a flexible employer, but also an ability to think on a higher order than simply doing some CRUD tasks.
The fact that I wouldn't trust any LLM to touch any of my code in those real world cases makes me think that most people who are touting them are not, in fact, writing code at the same level or doing the same job I do. Or understand it very well.
True and LLM have no incentive to avoid writing code. It’s even worse they are « paid » by the amount of code they generate. So default behavior is to avoid asking questions to refine the need. They thrive on blurry and imprecise prompt because in any case they’ll generate thousands of loc, regardless of the pertinence.
Many people confirmed that in their experience.
I’ve never seen an LLM step back, ask questions and then code or avoid coding. It’s by design a choice of generating the most stuff because of money.
So right now an LLM and the developer you describe here are two very different thing and an LLM will, by design, never replace you
> When I save twenty hours of a client's money and my own time, by telling them that a new software feature they want would be unnecessary if they changed the order of questions their employees ask on the phone, I've done my job well.
I like to explain my work as "do whatever is needed to do as little work as possible".
Being by improving logs, improving architecture, updating logs, pushing responsibilities around or rejecting some features.
The most clever lines of code are the ones you don’t write. Often this is a matter of properly defining the problem in terms of data structure. LLMs are not at all good at seeing that a data structure is inside out and that by turning it right side in, we can fix half the problems.
More significantly though, OP seems right on to me. The basic functionality of LLMs is handy for a code writing assistant, but does not replace a software engineer, and is not ever likely too no matter how many janky accessories we bolt on. LLMs are fundamentally semantic pattern matching engines, and are only problem solvers in the context of problems that are either explicitly or implicitly defined and solved in their training data. They will always require supervision because there is fundamentally no difference between a useful LLM output and a “hallucination” except the utility rating that a human judge applies to the output.
LLMs are good at solving fully defined, fully solved problems. A lot of work falls into that category, but some does not.
>> The most clever lines of code are the ones you don’t write.
Just to add, I think there are three things that LLMs don't address here, but maybe it's because they're not being asked the broader questions:
1. What are some reasonable out-of-band alternatives to coding the thing I'm being asked to code?
2. What kind of future modifications might the client want, and how can we ensure this mod will accommodate those without creating too many new constraints, but also without over-preparing for something that night not happen?
3. What is the client missing that we're also missing? This could be as simple as forgetting that under some circumstances, the same icon is being used in a UI to mean something else. Or that an error box might obscure the important thing that just triggered the error. Or that six years ago, we created a special user level called "-1" that is a reserved level for employees in training, and users on that level can't write to certain tables. And asking the question whether we want them to be able to train on the new feature, and if so, whether there are exceptions to that which would open the permissions on the DB but restrict some operations in the middleware.
"What are we missing" is 95% of my job, and unit tests are useless if you don't know all the potential valid or invalid inputs.
I think this is a fair and valuable comment. Only part I think could be more nuanced is:
> The fact that I wouldn't trust any LLM to touch any of my code in those real world cases makes me think that most people who are touting them are not, in fact, writing code at the same level or doing the same job I do. Or understand it very well.
I agree with this specifically for agentic LLM use. However, I've personally increased my code speed and quality with LLMs for sure using purely local models as a really fancy auto complete for 1 or 2 lines at a time.
The rest of your comment is good, bit the last paragraph to me reads like someone inexperienced with LLMs looking to find excuses to justify not being productive with them, when others clearly are. Sorry.
Being effective with llm agents requires not just the ability to code or to appreciate nuance with libraries or business rules but to have the ability and proclivity of pedantry. Dad-splain everything always.
And to have boundless contextual awareness… dig a rabbit hole, but beware that you are in your own hole. At this point you can escape the hole but you have to be purposefully aware of what guardrails and ladders you give the agent to evoke action.
The better, more explicit guardrails you provide the more likely the agent is able to do what is expected and honor the scope and context you establish. If you tell it to use silverware to eat, be assured it doesn’t mean to use it appropriately or idiomatically and it will try eating soup with a fork.
Lastly don’t be afraid of commits and checkpoints, or to reject/rollback proposed changes and restate or reset the context. The agent might be the leading actor, but you are the director. When a scene doesn’t play out, try it again after clarification or changing camera perspective or lighting or lines, or cut/replace the scene entirely.
I find that level of pedantry and hand-holding, to be extremely tedious and I frequently find myself just thinking fuck it, I'll write it myself and get what I want the first time.
This. That’s why every programmer strive for a good architecture and write tests. When you have that and all your bug fixes and feature request are only a small amount of lines, that is pure bliss. Even if it requires hours of reading and designing. Anything is better than dumping lot of lines.
Because once you figure out the correct way to handhold, you can automate it and the tediousness goes away.
It’s only tedious once per codebase or task, then you find the less tedious recipe and you’re done.
You can even get others to do the tedious part at their layer of abstraction so that you don’t have to anymore. Same as compilers, cpu design, or any other pet of the stack lower than the one you’re using.
To be honest you sound super defensive, not just in a classic programmer when someone invades on their turf sort of way, but also in the classic way people who are reluctant to accept a new technology
This sentiment of, a human will always be needed, there’s no replacement for human touch, the stakes are too high, is as old as time
You just said, quite literally, that people leveraging LLMs to code are not doing it at your level - that’s borders on hubris
The fact of the matter is that like most tools, you get out of AI what you put into it
I know a lot of engineers and this pride, this reluctance to accept the help is super common
The best engineers on the other hand are leveraging this just fine, just another tool for them that speeds things up
Worth noting that there are business leaders who see high LOC and number of commits as metrics of good programmers. To them the 2000 LOC commits from offshore are proof that it's working. Sadly the proof that it's not will show in their sales and customer satisfaction if they keep producing their product long enough. For too long the business model in tech has been to get bought out so this doesn't often matter to business.
I'm not sure what any of what you just wrote has to do with LLMs. If you use LLMs to rubber duck or write tests/code, then all of the things you mentioned should still apply. That last logical leap, the fact that _you_ wouldn't trust LLM to touch your code means that people who do aren't at the same level as you is a fallacy.
At their peril, because any set of rules, no matter how seemingly simple, has edge cases that only become apparent once we take on the task of implementing them at the code level into a functioning app. And that's assuming specs have been written up by someone who has made every effort to consider every relevant condition, which is never the case.
And in the example of "why" this 401 is happening that's another one of those. The spec might have said to return a 401 for both not being authenticated and for not having enough privileges.
But that's just plain wrong and a proper developer would be allowed to change that. If you're not authenticating properly, you get a 401. That means you can't prove you're who you say you are.
If you are past that, i.e. we know that you are who you say you are, then the proper return code is 403 for saying "You are not allowed to access what you're trying to access, given who you are".
Which funnily enough seems to be a very elusive concept to many humans as well, never mind an LLM.
...then there are the other fun ones, like not wanting to tell people things exist that they don't have access to, like Github returning 404 errors for private repositories you know exist when you aren't logged into an account that has access to them.
That one at least makes sense if you ask me. It's not just Github doing it. On the web side of things you'd return the same "no such thing here" page whether you don't have access or it really doesn't exist as well. So leaking more info than the page you return to users in the browser would show via the status code would not be good.
I.e. that would be the appropriate thing to do if you're trying to prevent leakage of information i.e. enumeration of resources. But you should not return 401 for this still. A 404 is the appropriate response for pretending that "it's just not there" if you ask me. You can't return 404 when it's not there and a 403 when you have no access if enumeration is bad.
So for example, if you don't have access to say the settings of a repo you have access to, a 403 is OK. No use pretending with a 404, because we all know the settings are just a feature of Github.
However, pretending that a repo you don't have access to but exists isn't there with a 404 is appropriate because otherwise you could prove the existence of "superSecretRepo123" simply by guessing and getting a 403 instead of a 404.
> That seems very dependent on which company you work for. Many would not grant you that kind of flexibility.
It really boils down to what scenario you have in mind. Developers do interact with product managers and discussions do involve information flowing both ways. Even if a PM ultimately decides what the product should do, you as a developer have a say in the process and outcome.
Also, there are always technological constraints, and some times even practical constraints are critical. A PM might want to push this or that feature but if it's impossible to deliver on a specific deadline they have no alternative to compromise, and the compromise is determined by what developers call out.
The majority of places I've worked don't adjust business rules on the fly because of flexibility. They do it because "we need this out the door next month". They need to ship and ship now. Asking clarifying questions at some of these dumpster fires is actually looked down upon, much less taking the time to write or even informally have a spec.
How does that work in an AI-supported development process? I'm a bit out of the loop since I left the industry. Usually there is a lot of back and forth over things like which fields go in a form, and whether asking for a last name will impact the conversion rate and so on.
But all the senior business folks think AI can do no wrong and want to put it out the door anyway, assuming all the experienced engineers are just trying to get more money or something.
This is a very common statement but doesn't match my experience at all, unless you expand "business rules" to mean "not code already".
There's plenty of that work, and it goes by many names ("enterprise", others).
But lots and lots and lots of programmers are concerned with using computers for computations: making things with the new hardware that you couldnt with the old hardware being an example. Embedded, cryptography, graphics, simulation, ML, drones and compilers and all kinds of stuff are much more about resources than business logic.
You can define up business logic to cover anything I guess, but at some point its no longer what you meant by that.
Yes although many software engineers try as hard as possible to avoid learning what the business problem is. In my experience though those people never make great engineers.
Often those of us that do want to learn what the business problem is are not allowed to be involved in those discussions, for various reasons. Sometimes it's "Oh we can take care of that so you don't have to deal with it," and sometimes it's "Just build to this design/spec" and they're not used to engineers (the good ones) questioning things.
I had a professor in grad school, Computer Engineering, that begged me not to get an MBA--he had worked in industry, particularly defense, and had a very low opinion of MBAs. I tend to agree nowadays. I really think the cookie-cutter "safe" approach that MBA types take, along with them maximizing profits using data science tools, has made the USA a worse place overall.
My problem was that the business problems were so tough on most of the gigs I had that it was next to impossible to build a solution for them! Dealing with medical claims in real time at volume was horrendous.
Understanding the business problem or goal is actually the context for correctly writing code. Without it, you start acting like an LLM that didn't receive all the necessary code to solve a task.
When a non-developer writes code with an LLM, their ability to write good code decreases. But at the same time, it goes up thanks to more "business context."
In a year or two, I imagine that a non-developer with a proper LLM may surpass a vanilla developer.
They usually code for the happy path, and add edge cases as bugs are discovered in production. But after a while both happy path and edge cases blend into a ball of mud that you need the correct incantation to get running. And it's a logic maze that contradict every piece of documentation you can find (ticket, emails). Then it quickly become something that people don't dare to touch.
I guess that really is a thing, eh? That concept is pretty foreign to me. How on earth are you supposed to do domain modelling if you don't understand the domain?
Nearly 100%. They don't call it that or use that term, and almost never _design_ thinking about the domain. But the absence of a formal 'domain model' still results in domain modeling - it's just done at the level of IC who may or may not have any awareness of the broader implications of the model they are creating.
>Software engineers are able to step back, think about the whole thing, and determine the root cause of a problem.
Agree strongly, and I think this is basically what the article is saying as well about keeping a mental model of requirements/code behavior. We kind of already knew this was the hard part. How many times have you heard that once you get past junior level, the hard part is not writing the code? And that It's knowing what code to write? This realization is practically a right of passage.
Which kind of begs the question for what the software engineering job looks like in the future. It definitely depends on how good the AI is. In the most simplistic case, AI can do all the coding right now and all you need is a task issue. And frankly probably a user written (or at least reviewed, but probably written) test. You could make the issue and test upfront and farm out the PR to an agent and manually approve when you see it passed the test case you wrote.
In that case you are basically PM and QA. You are not even forming the prompt, just detailing the requirements.
But as the tech improves can all tasks fit into that model? Not design/architecture tasks - or at least without a new task completion model than described above. The window will probably grow, but its hard to imagine that it will handle all pure coding tasks. Even for large tasks that theorhetically can fit into that model, you are going to have to do a lot of thinking and testing and prototyping to figure out the requirements and test cases. In theory you could apply the same task/test process but that seems like it would be too much structure and indirection to actually be helpful compared to knowing how to code.
What if LLMs get 'a mental model of requirements/code behavior'? LLMs may have experts in it, each with its own specialty. You can even combine several LLMs, each doing its own thing: one creates architecture, another writes documentation, a third critiques, a fourth writes code, a fifth creates and updates the "mental model," etc.
I agree with the PM role, but with such low requirements that anyone can do it.
No. That's the narrow definition of a code monkey who gets told what to do.
The good ones wear multiple hats and actually define the problem, learns sufficiently about a domain to interact with it or the experts on said domain and figures out what are the short Vs long term tradeoffs to focus on the value and not just the technical aspect.
I wouldn't say "translating", but "finding/constructing a model that satisfies the business rules".
This can be quite hard in some cases, in particular if some business rules are contradicting each other or can be combined in surprisingly complex ways.
But software architects (especially of various reusable frameworks) have to maintain the right set of abstractions and make sure the system is correct and fast, easy to debug, that developers fall into the pit of success etc.
Here are just a few major ones, each of which would be a chapter in a book I would write about software engineering:
ENVIRONMENTS & WORKFLOWS
Environment Setup
Set up a local IDE with a full clone of the app (frontend, backend, DB).
Use .env or similar to manage config/secrets; never commit them.
Debuggers and breakpoints are more scalable than console.log.
Prefer conditional or version-controlled breakpoints in feature branches.
Test & Deployment Environments
Maintain at least 3 environments: Local (dev), Staging (integration test), Live (production).
Make state cloning easy (e.g., DB snapshots or test fixtures).
Use feature flags to isolate experimental code from production.
BUGS & REGRESSIONS
Bug Hygiene
Version control everything except secrets.
Use linting and commit hooks to enforce code quality.
A bug isn’t fixed unless it’s reliably reproducible.
Encourage bug reporters to reset to clean state and provide clear steps.
Fix in Context
Keep branches showing the bug, even if it vanishes upstream.
Always fix bugs in the original context to avoid masking root causes.
EFFICIENCY & SCALE
Lazy & On-Demand
Lazy-load data/assets unless profiling suggests otherwise.
Use layered caching: session, view, DB level.
Always bound cache size to avoid memory leaks.
Pre-generate static pages where possible—static sites are high-efficiency caches.
Avoid I/O
Use local computation (e.g., HMAC-signed tokens) over DB hits.
Encode routing/logic decisions into sessionId/clientId when feasible.
Partitioning & Scaling
Shard your data; that’s often the bottleneck.
Centralize the source of truth; replicate locally.
Use multimaster sync (vector clocks, CRDTs) only when essential.
Aim for O(log N) operations; allow O(N) preprocessing if needed.
CODEBASE DESIGN
Pragmatic Abstraction
Use simple, obvious algorithms first—optimize when proven necessary.
Producer-side optimization compounds through reuse.
Apply the 80/20 rule: optimize for the common case, not the edge.
Async & Modular
Default to async for side-effectful functions, even if not awaited (in JS).
Namespace modules to avoid globals.
Autoload code paths on demand to reduce initial complexity.
Hooks & Extensibility
Use layered architecture: Transport → Controller → Model → Adapter.
Add hookable events for observability and customization.
Wrap external I/O with middleware/adapters to isolate failures.
SECURITY & INTEGRITY
Input Validation & Escaping
Validate all untrusted input at the boundary.
Sanitize input and escape output to prevent XSS, SQLi, etc.
Apply defense-in-depth: validate client-side, then re-validate server-side.
Session & Token Security
Use HMACs or signatures to validate tokens without needing DB access.
Enable secure edge-based filtering (e.g., CDN rules based on token claims).
Tamper Resistance
Use content-addressable storage to detect object integrity.
Append-only logs support auditability and sync.
INTERNATIONALIZATION & ACCESSIBILITY
I18n & L10n
Externalize all user-visible strings.
Use structured translation systems with context-aware keys.
Design for RTL (right-to-left) languages and varying plural forms.
Accessibility (A11y)
Use semantic HTML and ARIA roles where needed.
Support keyboard navigation and screen readers.
Ensure color contrast and readable fonts in UI design.
GENERAL ENGINEERING PRINCIPLES
Idempotency & Replay
Handlers should be idempotent where possible.
Design for repeatable operations and safe retries.
Append-only logs and hashes help with replay and audit.
Developer Experience (DX)
Provide trace logs, debug UIs, and metrics.
Make it easy to fork, override, and simulate environments.
Build composable, testable components.
ADDITIONAL TOPICS WORTH COVERING
Logging & Observability
Use structured logging (JSON, key-value) for easy analysis.
Tag logs with request/session IDs.
Separate logs by severity (debug/info/warn/error/fatal).
Configuration Management
Use environment variables for config, not hardcoded values.
Support override layers (defaults → env vars → CLI → runtime).
Ensure configuration is reloadable without restarting services if possible.
Continuous Integration / Delivery
Automate tests and checks before merging.
Use canary releases and feature flags for safe rollouts.
Keep pipelines fast to reduce friction.
> a book I would write about software engineering:
You should probably go do that, rather than using the comment section of HN as a scratch pad of your stream of consciousness. That's not useful to anyone other than yourself.
> How would you know if it adds nothing of value if you stopped reading it? :)
If you write a wall of text where the first pages are inane drivel, what do you think are the odds that the rest of that wall of text suddenly adds readable gems?
Sometimes a turd is just a turd, and you don't need to analyze all of it to know the best thing to do is to flush it.
It really isn't. There is no point to pretend it is, and even less of a point to expect anyone should waste their time with an unreadable and incoherent wall of text.
You decide how you waste your time, and so does everyone else.
1. Set up a local IDE with a full clone of the app (frontend, backend, DB).
Thus the app must be fully able to run on a small, local environment, which is true of open source apps but not always for for-profit companies
2. Use .env or similar to manage config/secrets; never commit them.
A lot of people don’t properly exclude secrets from version control, leading to catastrophic secret leaks. Also when everyone has their own copy, the developer secrets and credentials aren’t that important.
3. Debuggers and breakpoints are more scalable than console.log. Prefer conditional or version-controlled breakpoints in feature branches.
A lot of people don’t use debuggers and breakpoints, instead doing logging. Also they have no idea how to maintain DIFFERENT sets of breakpoints, which you can do by checking the project files into version control, and varying them by branches.
4. Test & Deployment Environments Maintain at least 3 environments: Local (dev), Staging (integration test), Live (production).
This is fairly standard advice, but it is best practice, so people can test in local and staging.
5. Make state cloning easy (e.g., DB snapshots or test fixtures).
This is not trivial. For example downloading a local copy of a test database, to test your local copy of Facebook with a production-style database. Make it fast, eg by rsync mysql innodb files.
> We don't see the debugger output and go "how do I make this bad output go away?!?"
In the past, I've worked with developers that do. You ask them to investigate and deal with an error message, and all they do is whatever makes the error go away. Oh, a null pointer exception is thrown? Lets wrap it in a try/catch and move on.
If you haven't worked with someone who you honestly think may have a severe learning disability or outright brain damage, you just haven't been working professionally for long enough.
The first cars broke down all the time. They had a limited range. There wasn't a vast supply of parts for them. There wasn't a vast industry of experts who could work on them. There wasn't a vast network of fuel stations to provide energy for them. The horse was a proven method.
What an LLM cannot do today is almost irrelevant in the tide of change upon the industry. The fact is, with improvements, it doesn't mean an LLM cannot do it tomorrow.
The difference is that the weaknesses of cars were problems of engineering, and some of infrastructure. Both aren't very hard to solve, though they take time. The fundamental way cars operated worked and just needed revision, sanding off rough edges.
LLMs are not like this. The fundamental way they operate, the core of their design is faulty. They don't understand rules or knowledge. They can't, despite marketing, really reason. They can't learn with each interaction. They don't understand what they write.
All they do is spit out the most likely text to follow some other text based on probability. For casual discussion about well-written topics, that's more than good enough. But for unique problems in a non-English language, it struggles. It always will. It doesn't matter how big you make the model.
They're great for writing boilerplate that has been written a million times with different variations - which can save programmers a LOT of time. The moment you hand them anything more complex it's asking for disaster.
> [LLMs] spit out the most likely text to follow some other text based on probability.
Modern coding AI models are not just probability crunching transformers. They haven't been just that for some time. In current coding models the transformer bit is just one part of what is really an expert system. The complete package includes things like highly curated training data, specialized tokenizers, pre and post training regimens, guardrails, optimized system prompts etc, all tuned to coding. Put it all together and you get one shot performance on generating the type of code that was unthinkable even a year ago.
The point is that the entire expert system is getting better at a rapid pace and the probability bit is just one part of it. The complexity frontier for code generation keeps moving and there's still a lot of low hanging fruit to be had in pushing it forward.
> They're great for writing boilerplate that has been written a million times with different variations
That's >90% of all code in the wild. Probably more. We have three quarters of a century of code in our history so there is very little that's original anymore. Maybe original to the human coder fresh out of school, but the models have all this history to draw upon. So if the models produce the boilerplate reliably then human toil in writing if/then statements is at an end. Kind of like - barring the occasional mad genious [0] - the vast majority of coders don't write assembly to create a website anymore.
> Modern coding AI models are not just probability crunching transformers. (...) The complete package includes things like highly curated training data, specialized tokenizers, pre and post training regimens, guardrails, optimized system prompts etc, all tuned to coding.
It seems you were not aware you ended up describing probabilistic coding transformers. Each and every single one of those details are nothing more than strategies to apply constraints to the probability distributions used by the probability crunching transformers. I mean, read what you wrote: what do you think that "curated training data" means?
> Put it all together and you get one shot performance on generating the type of code that was unthinkable even a year ago.
>The complete package includes things like highly curated training data, specialized tokenizers, pre and post training regimens, guardrails, optimized system prompts etc, all tuned to coding.
And even with all that, they still produce garbage way too often. If we continue the "car" analogy, the car would crash randomly sometimes when you leave the driveway, and sometimes it would just drive into the house. So you add all kinds of fancy bumpers to the car and guard rails to the roads, and the car still runs off the road way too often.
what we should and what we are forced to do are very different things. if I can get a machine to do the stuff I hate dealing with, I'll take it every time.
After a while, it just make sense to redesign the boilerplate and build some abstraction instead. Duplicated logic and data is hard to change and fix. The frustration is a clear signal to take a step back and take an holistic view of the system.
And this is a great example of something I rarely see LLMs doing. I think we're approaching a point where we will use LLMs to manage code the way we use React to manage the DOM. You need an update to a feature? The LLM will just recode it wholesale. All of the problems we have in software development will dissolve in mountains of disposable code. I could see enterprise systems being replaced hourly for security reasons. Less chance of abusing a vulnerability if it only exists for an hour to find and exploit. Since the popularity of LLMs proves that as a society we've stopped caring about quality, I have a hard time seeing any other future.
>In current coding models the transformer bit is just one part of what is really an expert system. The complete package includes things like highly curated training data, specialized tokenizers, pre and post training regimens, guardrails, optimized system prompts etc, all tuned to coding. Put it all together and you get one shot performance on generating the type of code that was unthinkable even a year ago.
This is lipstick on a pig. All those methods are impressive, but ultimately workarounds for an idea that is fundamentally unsuitable for programming.
>That's >90% of all code in the wild. Probably more.
Maybe, but not 90% of time spent on programming. Boilerplate is easy. It's the 20%/80% rule in action.
I don't deny these tools can be useful and save time - but they can't be left to their own devices. They need to be tightly controlled and given narrow scopes, with heavy oversight by an SME who knows what the code is supposed to be doing. "Design W module with X interface designed to do Y in Z way", keeping it as small as possible and reviewing it to hell and back. And keeping it accountable by making tests yourself. Never let it test itself, it simply cannot be trusted to do so.
LLMs are incredibly good at writing something that looks reasonable, but is complete nonsense. That's horrible from a code maintenance perspective.
> For casual discussion about well-written topics, that's more than good enough. But for unique problems in a non-English language, it struggles. It always will. It doesn't matter how big you make the model.
Not to disagree, but "non-english" isn't exactly relevant. For unique problems, LLMs can still manage to output hallucinations that end up being right or useful. For example, LLMs can predict what an API looks like and how it works even if they do not have the API in context if the API was designed following standard design principles and best practices. LLMs can also build up context while you interact with them, which means that iteratively prompting them that X works while Y doesn't will help them build the necessary and sufficient context to output accurate responses.
This is the first word that came to mind when reading the comment above yours. Like:
>They can't, despite marketing, really reason
They aren't, despite marketing, really hallucinations.
Now I understand why these companies don't want to market using terms like "extrapolated bullshit", but I don't understand how there is any technological solution to it without starting from a fresh base.
> They aren't, despite marketing, really hallucinations.
They are hallucinations. You might not be aware of what that concept means in terms of LLMs but just because you are oblivious to the definition of a concept that does not mean it doesn't exist.
You can learn about the concept by spending a couple of minutes reading this article on Wikipedia.
> You might not be aware of what that concept means in terms of LLMs
GP is perfectly aware of this, and disagrees that the metaphor used to apply the term is apt.
Just because you use a word to describe a phenomenon doesn't actually make the phenomenon similar to others that were previously described with that word, in all the ways that everyone will find salient.
When AIs generate code that makes a call to a non-existent function, it's not because they are temporarily mistakenly perceiving (i.e., "hallucinating") that function to be mentioned in the documentation. It's because the name they've chosen for the function fits their model for what a function that performs the necessary task might be called.
And even that is accepting that they model the task itself (as opposed to words and phrases that describe the task) and that they somehow have the capability to reason about that task, which has somehow arisen from a pure language model (whereas humans can, from infancy, actually observe reality, and contemplate the effect of their actions upon the real world around them). Knowing that e.g. the word "oven" often follows the word "hot" is not, in fact, tantamount to understanding heat.
In short, they don't perceive, at all. So how can they be mistaken in their perception?
Irrelevant. Wikipedia does not create concepts. Again, if you take a few minutes to learn about the topic you will eventually understand the concept was coined a couple of decades ago, and has a specific meaning.
Either you opt to learn, or you don't. Your choice.
> Here's the first linked source:
Irrelevant. Your argument is as pointless and silly as claiming rubber duck debugging doesn't exist because no rubber duck is involved.
>calling their mistakes ‘hallucinations’ isn’t harmless: it lends itself to the confusion that the machines are in some way misperceiving but are nonetheless trying to convey something that they believe or have perceived.
What an enlightening input. I will now follow another source, 'Why ChatGPT and Bing Chat are so good at making things up'
>In academic literature, AI researchers often call these mistakes "hallucinations." But that label has grown controversial as the topic becomes mainstream because some people feel it anthropomorphizes AI models (suggesting they have human-like features) or gives them agency (suggesting they can make their own choices) in situations where that should not be implied. The creators of commercial LLMs may also use hallucinations as an excuse to blame the AI model for faulty outputs instead of taking responsibility for the outputs themselves.
>Still, generative AI is so new that we need metaphors borrowed from existing ideas to explain these highly technical concepts to the broader public. In this vein, we feel the term "confabulation," although similarly imperfect, is a better metaphor than "hallucination." In human psychology, a "confabulation" occurs when someone's memory has a gap and the brain convincingly fills in the rest without intending to deceive others. ChatGPT does not work like the human brain, but the term "confabulation" arguably serves as a better metaphor because there's a creative gap-filling principle at work
It links to a tweet from someone called 'Yann LeCun':
>Future AI systems that are factual (do not hallucinate)[...] will have a very different architecture from the current crop of Auto-Regressive LLMs.
That was an interesting diversion, but let's go back to learning more. How about 'AI Hallucinations: A Misnomer Worth Clarifying'?
>Maleki, N., Padmanabhan, B. and Dutta, K. (2024). AI Hallucinations: A Misnomer Worth Clarifying. 2024 IEEE Conference on Artificial Intelligence (CAI). doi:https://doi.org/10.1109/cai59869.2024.00033.
Maleki et al. say:
>As large language models continue to advance in Artificial Intelligence (AI), text generation systems have been shown to suffer from a problematic phenomenon often termed as "hallucination." However, with AI’s increasing presence across various domains, including medicine, concerns have arisen regarding the use of the term itself. [...] Our results highlight a lack of consistency in how the term is used, but also help identify several alternative terms in the literature.
Wow, how interesting! I'm glad I opted to learn that!
My fun was spoiled though. I tried following a link to the 1995 paper, but it was SUPER BORING because it didn't say 'hallucinations' anywhere! What a waste of effort, after I had to go to those weird websites just to be able to access it!
I'm glad I got the opportunity to learn about Hallucinations (Artificial Intelligence) and how they are meaningfully different from bullshit, and how they can be avoided in the future. Thank you!
> how so? programs might use english words but are decidedly not english.
I pointed out the fact that the concept of a language doesn't exist in token predictors. They are trained with a corpus, and LLMs generate outputs that reflect how the input is mapped in accordance to how the were trains with said corpus. Natural language makes the problem harder, but not being English is only relevant in terms of what corpus was used to train them.
How can you tell a human actually understands? Prove to me that human thought is not predicting the most probable next token. If it quacks like duck. In psychology research the only way to research if a human is happy is to ask them.
Does speaking in your native language, speaking in a second language, thinking about your life and doing maths feel exactly the same to you?
> Prove to me that human thought is not predicting the most probable next token.
Explain the concept of color to a completely blind person. If their brain does nothing but process tokens this should be easy.
> How can you tell a human actually understands?
What a strange question coming from a human. I would say if you are a human with a consciousness you are able to answer this for yourself, and if you aren't no answer will help.
Oh, I dunno. The whole "mappers vs packers" and "wordcels vs shape rotators" dichotomies point at an underlying truth, which is that humans don't always actually understand what they're talking about, even when they're saying all the "right" words. This is one reason why tech interviewing is so difficult: it's partly a task of figuring out if someone understands, or has just learned the right phrases and superficial exercises.
That is ill-posed. Take any algorithm at all, e.g. a TSP solver. Make a "most probable next token predictor" that takes the given traveling salesman problem, runs the solver, and emits the first token of the solution, then reruns the solver and emits the next token, and so on.
By this thought experiment you can make any computational process into "predict the most probable next token" - at an extreme runtime cost. But if you do so, you arguably empty the concept "token predictor" of most of its meaning. So you would need to more accurately specify what you mean by a token predictor so that the answer isn't trivially true (for every kind of thought that's computation-like).
I feel this way about Typescript too. There are a lot of people in engineering these days who don't think critically or exercise full observation when using popular technologies. I don't feel like it was like this 15 years ago, but it probably was...
This stops being an interesting philosophical problem when you recognise the vast complexity of animal brains that LLMs fail to replicate or substitute.
I stopped finding those arguments entertaining after a while. It always ends up "there's something that will always be missing, I just know it, but I won't tell you what. I'm just willing to go round and round in circles."
People who are certain that computers can't replicate human level intelligence aren't being intellectually rigourous. The same applies to people who are certain computers can replicate human level intelligence.
We can make arguments for informed guesses but there are simply still too many unknowns to be certain either way. People who claim to be certain are just being presumptuous.
> The same applies to people who are certain computers can replicate human level intelligence.
that's the thing, I'm not certain that "computers" can replicate human level intelligence. for one that statement would have to include a rigorous definition of what a computer is and what is excluded.
no, I just don't buy the idea that human level intelligence is only achievable in human born meatbags. at this point the only evidence has been "look, birds flap their wings and man doesn't have wings, therefore man will never fly".
If we could design a human would we design them with menstrual cycles? Why would we even target human intelligence. Feels like setting the bar low and not being very creative...
Seriously, the human brain is susceptible to self stroking patterns that result in disordered thinking. We spend inordinate amounts of energy daydreaming, and processing visual and auditory stimulus. We require sleep and don't fully understand why. So why would we target human intelligence? Propaganda. Anyone worried about losing their livelihood to automation is going to take notice. AI has the place in the zeitgeist today that Robots occupied in the 1980s and for the same reason. The wealthy and powerful can see the power it has socially right now and they are doing whatever they can to leverage it. It's why they don't call it LLMs but AI because AI is scarier. It's why all the tech bro CEOs signed the "pause" letter.
If this was about man flying we would be making an airplane instead of talking about how the next breakthrough will make us all into angels. LLMs are clever inventions they're just not independently clever.
Realistically, most people are not going to be on top end models. They're expensive and will get far, far FAR more expensive once these companies feel they're sufficiently entrenched enough to crank up pricing.
It's basically the silicon valley playbook to offer a service for dirt cheap (completely unprofitable) and then once they secure the market they make skyrocket the price.
A mid range model is what most people will be able to use.
If you want to know what's possible, you look at the frontier models. If you want it cheap, you wait a year and it'll get distilled into the cheaper ones.
> LLMs are not like this. The fundamental way they operate, the core of their design is faulty. They don't understand rules or knowledge. They can't, despite marketing, really reason. They can't learn with each interaction. They don't understand what they write.
Said like a true software person. I'm to understand that computer people are looking at LLMs from the wrong end of the telescope; and that from a neuroscience perspective, there's a growing consensus among neuroscientists that the brain is fundamentally a token predictor, and that it works on exactly the same principles as LLMs. The only difference between a brain and an LLM maybe the size of its memory, and what kind and quality of data it's trained on.
>from a neuroscience perspective, there's a growing consensus among neuroscientists that the brain is fundamentally a token predictor, and that it works on exactly the same principles as LLMs
Hahahahahaha.
Oh god, you're serious.
Sure, let's just completely ignore all the other types of processing that the brain does. Sensory input processing, emotional regulation, social behavior, spatial reasoning, long and short term planning, the complex communication and feedback between every part of the body - even down to the gut microbiome.
The brain (human or otherwise) is incredibly complex and we've barely scraped the surface of how it works. It's not just nuerons (which are themselves complex), it's interactions between thousands of types of cells performing multiple functions each. It will likely be hundreds of years before we get a full grasp on how it truly works - if we ever do at all.
> The only difference between a brain and an LLM maybe the size of its memory, and what kind and quality of data it's trained on.
This is trivially proven false, because LLMs have far larger memory than your average human brain and are trained on far more data. Yet they do not come even close to approximating human cognition.
I feel like we're underestimating how much data we as humans are exposed to. There's a reason AI struggles to generate an image of a full glass of wine. It has no concept of what wine is. It probably knows way more theory about it than any human, but it's missing the physical.
In order to train AIs the way we train ourselves, we'll need to give it more senses, and I'm no data scientist but that's presumably an inordinate amount of data. Training AI to feel, smell, see in 3D, etc is probably going to cost exponentially more than what the AI companies make now or ever will. But that is the only way to make AI understand rather than know.
We often like to state how much more capacity for knowledge AI has than the average human, but in reality we are just underestimating ourselves as humans.
I think this conversation is dancing around the relationship of memory and knowledge. Simply storing information is different than knowing it. One of you is thinking book learning while the other is thinking street smarts.
> and that from a neuroscience perspective, there's a growing consensus among neuroscientists that the brain is fundamentally a token predictor, and that it works on exactly the same principles as LLMs
Can you cite at least one recognized, credible neuroscientist who makes this claim?
Look you don't have to lie at every opportunity you get. You are fully aware and know what you've written is bullshit.
Tokens are a highly specific transformer exclusive concept. The human brain doesn't run a byte pair encoding (BPE) tokenizer [0] in their head. anything as tokens. It uses asynchronous time varying spiking analog signals. Humans are the inventors of human languages and are not bound to any static token encoding scheme, so this view of what humans do as "token prediction" requires either a gross misrepresentation of what a token is or what humans do.
If I had to argue that humans are similar to anything in machine learning research specifically, I would have to argue that they extremely loosely follow the following principles:
* reinforcement learning with the non-brain parts defining the reward function (primarily hormones and pain receptors)
* an extremely complicated non-linear kalman filter that not only estimates the current state of the human body, but also "estimates" the parameters of a sensor fusing model
* there is a necessary projection of the sensor fused result that then serves as available data/input to the reinforcement learning part of the brain
Now here are two big reasons why the model I describe is a better fit:
The first reason is that I am extremely loose and vague. By playing word games I have weaseled myself out of any specific technology and am on the level of concepts.
The second reason is that the kalman filter concept here is general enough that it also includes predictor models, but the predictor model here is not the output that drives human action, because that would logically require the dataset to already contain human actions, which is what you did, you assume that all learning is imitation learning.
In my model, any internal predictor model that is part of the kalman filter is used to collect data, not drive human action. Actions like eating or drinking are instead driven by the state of the human body, e.g. hunger is controlled through leptin and insulin and others. All forms of work, no matter how much of a detour it represents, ultimately has the goal of feeding yourself or your family (=reproduction).
[0] A BPE tokenizer is a piece of human written software that was given a dataset to generate an efficient encoding scheme and the idea itself is completely independent of machine learning and neural networks. The fundamental idea behind BPE is that you generate a static compression dictionary and never change it.
We can reasonably speak about certain fundamental limitations of LLMs without those being claims about what AI may ever do.
I would agree they fundamentally lack models of the current task and that it is not very likely that continually growing the context will solve that problem, since it hasn't already. That doesn't mean there won't someday be an AI that has a model much as we humans do. But I'm fairly confident it won't be an LLM. It may have an LLM as a component but the AI component won't be primarily an LLM. It'll be something else.
Every AI-related invention is hyped as "intelligence" but turns out to be "Necessary but Not Sufficient" for true intelligence.
Neural networks are necessary but not sufficient. LLMs are necessary but not sufficient.
I have no doubt that there are multiple (perhaps thousands? more?) of LLM-like subsystems in our brains. They appear to be a necessary part of creating useful intelligence. My pet theory is that LLMs are used for associative memory purposes. They help generate new ideas and make predictions. They extract information buried in other memory. Clearly there is another system on top that tests, refines, and organizes the output. And probably does many more things we haven't even thought to name yet.
What do you mean? Most adult humans can learn to drive a car, book a plain ticket, get a passport, fly abroad, navigate in a foreign country etc. There is a variation in human intelligence, but almost all humans are very intelligent compared to everything else we know about.
Not really, only "merchants" are trying to package and sell LLMs as "artificial intelligence". To this day AI still very much is the name of a research field focused on computational methods: it's not a discovery, it's not a singular product or tool at or disposal (or it is in no greater capacity than Markov chains, support vector machines or other techniques that came before). If you ever expect the goalposts to settle, you are essentially wishing for research to stop.
I don't think we fully understand all the aspects of intelligence. What the potential feature set is. How to categorize or break it down into parts. We have some data and some categories but we are so far away from a full description that it only makes sense we must move the goalposts constantly.
The premise that an AI needs to do Y "as we do" to be good at X because humans use Y to be good at X needs closer examination. This presumption seems to be omnipresent in these conversations and I find it so strange. Alpha Zero doesn't model chess "the way we do".
Both that, and that we should not expect LLMs to achieve ability with humans as baseline comparison. It’s as if cars were rapidly getting better due to some new innovation, and expecting them to fly within a year. It’s a new, and different thing, where the universality of ”plausibly sounding” coherent text appeared to be general, when it’s advanced pattern matching. Nothing wrong with that, pattern matching is extremely useful, but drawing the equal sign to human cognition is extremely premature, and a bet that is very likely be wrong.
> The premise that an AI needs to do Y "as we do" to be good at X because humans use Y to be good at X needs closer examination.
I don't see it being used as a premise. It see it as speculation that is trying to understand why this type of AI underperforms at certain types of tasks. Y may not be necessary to do X well, but if a system is doing X poorly and the difference between that system and another system seems to be Y, it's worth exploring if adding Y would improve the performance.
I have to disagree. Anyone that says LLMs do not qualify as AI are the same people who will continue to move the goal posts for AGI. "Well it doesn't do this!". No one here is trying to replicate a human brain or condition in its entirety. They just want to replicate the thinking ability of one. LLMs represent the closest parallel we have experienced thus far to that goal. Saying that LLMs are not AI feel disingenuous at best and entirely purposely dishonest at the worst (perhaps perceived as staving off the impending demise of a profession).
The sooner people stop worrying about a label for what you feel fits LLMs best, the sooner they can find the things they (LLMs) absolutely excel at and improve their (the user's) workflows.
Stop fighting the future. Its not replacing right now. Later? Maybe. But right now the developers and users fully embracing it are experiencing productivity boosts unseen previously.
> the developers and users fully embracing it are experiencing productivity boosts unseen previously
This is the kind of thing that I disagree with. Over the last 75 years we’ve seen enormous productivity gains.
You think that LLMs are a bigger productivity boost than moving from physically rewiring computers to using punch cards, from running programs as batch processes with printed output to getting immediate output, from programming in assembly to higher level languages, or even just moving from enterprise Java to Rails?
Even learning your current $EDITOR and $SHELL can be a great productivity booster. I see people claiming AI is helping them and you see them hunting for files in the file manager tree instead of using `grep` or `find` (Unix).
The studies I've seen for AI actually improving productivity are a lot more modest than what the hype would have you believe. For example: https://www.youtube.com/watch?v=tbDDYKRFjhk
Skepticism isn't the same thing as fighting the future.
I will call something AGI when it can reliably solve novel problems it hasn't been pre-trained on. That's my goal post and I haven't moved it.
> For all intents and purposes of the public. AI == LLM. End of story. Doesn't matter what developers say.
This is interesting, because it's so clearly wrong. The developers are also the people who develop the LLMs, so obviously what they say is actually the factual matter of the situation. It absolutely does matter what they say.
But the public perception is that AI == LLM, agreed. Until it changes and the next development comes along, when suddenly public perception will change and LLMs will be old news, obviously not AI, and the new shiny will be AI. So not End of Story.
People are morons. Individuals are smart, intelligent, funny, interesting, etc. But in groups we're moronic.
>Do you produce good results every time, first try?
Almost always, yes, because I know what I'm doing and I have a brain that can think. I actually think before I do anything, which leads to good results. Don't assume everyone is a junior.
If you always use your first output then you are not a senior engineer, either your problem space is THAT simple that you can fit all your context in your head at the same time first try, or quite frankly you just bodge things together in non-optimal way.
It always takes some tries at a problem to grasp edge cases and to easier visualize the problem space.
Depends on how you define "try". If someone asks me to do something I don't come back with a buggy piece of garbage and say "here, I'm done!", the first deliverable will be a valid one, or I'll say I need more to do it.
When I'm confident something will work it almost always works, that is very different from these models.
Sure sometimes I do stuff I am not confident about to learn but then I don't say "here I solved the problem for you" without building confidence around the solution first.
Every competent senior engineer should be like this, if you aren't then you aren't competent. If you are confident in a solution then it should almost always work, else you are over confident and thus not competent. LLM are confident in solutions that are shit.
In cybernetics, this label has existed for a long time.
Unfortunately, discourse has followed an epistemic trajectory influenced by Hollywood and science fiction, making clear communication on the subject nearly impossible without substantial misunderstanding.
> Anyone that says LLMs do not qualify as AI are the same people who will continue to move the goal posts for AGI.
I have the complete opposite feeling. The layman understanding of the term "AI" is AGI, a term that only needs to exist because researchers and businessmen hype their latest creations as AI.
The goalposts for AI don't move but the definition isn't precise but we know it when we see it.
AI, to the layman, is Skynet/Terminator, Asimov's robots, Data, etc.
The goalposts moving that you're seeing is when something the tech bubble calls AI escapes the tech bubble and everyone else looks at it and says, no, that's not AI.
The problem is that everything that comes out of the research efforts toward AI, the tech industry calls AI despite it not achieving that goal by the common understanding of the term. LLMs were/are a hopeful AI candidate but, as of today, they aren't but that doesn't stop OpenAI from trying to raise money using the term.
AI has had many, many lay meanings over the years. Simplistic decision trees and heuristics for video games is called AI. It is a loose term and trying to apply it with semantic rigour is useless, as is trying to tell people that it should only be used to match one of its many meanings.
If you want some semantic rigour use more specific terms like AGI, human equivalent AGI, super human AGI, exponentially self improving AGI, etc. Even those labels lack rigour, but at least they are less ambiguous.
LLMs are pretty clearly AI and AGI under commonly understood, lay definitions. LLMs are not human level AGI and perhaps will never be by themselves.
> LLMs are pretty clearly AI and AGI under commonly understood, lay definitions.
That's certainly not clear. For starters, I don't think there is a lay definition of AGI which is largely my point.
The only reason people are willing to call LLMs AI is because that's how they are being sold and the shine isn't yet off the rose.
How many people call Siri AI? It used to be but people have had time to feel around the edges where it fails to meet their expectations of AI.
You can tell what people think of AI by the kind of click bait surrounding LLMs. I read an article not too long ago with the headline about an LLM lying to try and not be turned off. Turns out it was intentionally prompted to do that but the point is that that kind of self preservation is what people expect of AI. Implicitly, they expect that AI has a "self".
AI and AGI are broad umbrella terms. Stuff like Alpha Zero is AI but not AGI while LLMs are both.
Engaging in semantic battles to try to change the meanings of those terms is just going to create more confusion, not less. Instead why not use more specific and descriptive labels to be clear about what you are saying.
Self-Aware AGI, Human Level AGI, Super-Human ANI, are all much more useful than trying to force general label to be used a specific way.
> I've never seen someone state, as fact, that LLMs are AGI before now.
Many LLMs are AI that weren't designed / trained to solve a narrow problem scope. They can complete a wide range of tasks with varying levels of proficiency. That makes them artificial general intelligence or AGI.
You are confused because lots of people use "AGI" as a shorthand to talk about "human level" AGI that isn't limited to a narrow problem scope.
It's not wrong to use the term this way, but it is ambiguous and vague.
Even the term "human level" is poorly defined and if I wanted to use the term "Human level AGI" for any kind of discussion of what qualifies, I'd need to specify how I was defining that.
I'm not confused at all. Your own personal definitions just further my point that tech people have a much different classification system that the general populous and that the need for those excessive classifications is that way ambitious CEOs keep using the term incorrectly in order to increase share prices.
It's actually very funny to me that you are stating these definitions so authoritatively despite the terms not having any sort if rigor attached to either their definition or usage.
> It's actually very funny to me that you are stating these definitions so authoritatively despite the terms not having any sort if rigor attached to either their definition or usage.
Huh? My entire point was that AI and AGI are loose, vague terms and if you want to be clear about what you are talkng about, you should use more specific terms.
> The sooner people stop worrying about a label for what you feel fits LLMs best, the sooner they can find the things they (LLMs) absolutely excel at and improve their (the user's) workflows.
This is not a fault of the users. These labels are pushed primarily by "AI" companies in order to hype their products to be far more capable than they are, which in turn increases their financial valuation. Starting with "AI" itself, "superintelligence", "reasoning", "chain of thought", "mixture of experts", and a bunch of other labels that anthropomorphize and aggrandize their products. This is a grifting tactic old as time itself.
From Sam Altman[1]:
> We are past the event horizon; the takeoff has started. Humanity is close to building digital superintelligence
Apologists will say "they're just words that best describe these products", repeat Dijkstra's "submarines don't swim" quote, but all of this is missing the point. These words are used deliberately because of their association to human concepts, when in reality the way the products work is not even close to what those words mean. In fact, the fuzzier the word's definition ("intelligence", "reasoning", "thought"), the more valuable it is, since it makes the product sound mysterious and magical, and makes it easier to shake off critics. This is an absolutely insidious marketing tactic.
The sooner companies start promoting their products honestly, the sooner their products will actually benefit humanity. Until then, we'll keep drowning in disinformation, and reaping the consequences of an unregulated marketplace of grifters.
>When the first cars broke down, people were not saying: One day, we’ll go to the moon with one of these.
maybe they should have; a lot of the engineering techniques and methodologies that produced the assembly line and the mass produced vehicle also lead the way into space exploration.
The article has a very nuanced point about why it’s not just a matter of today’s vs tomorrow’s LLMs. What’s lacking is a fundamental capacity to build mental models and learn new things specific to the problem at hand. Maybe this can be fixed in theory with some kind of on-the-fly finetuning, but it’s not just about more context.
You can give it some documents, or classroom textbooks, and it can turn those into rdf graphs, explaining what the main concepts are, and how they are related. This can then be used by an llm to solve other problems.
It can also learn new things using trial and error with mcp tools. Once it has figured out some problem, you can ask it to summarize the insights for later use.
I’m not an expert on this, so I’m not familiar with what RDF graphs are, but I feel like everything you’re describing happens textually, and used as context? That is, it’s not at all ”learning” the way it’s learning during training, but by writing things down to refer to them later? As you say - ”ask it to summarize the insights for later use” - this is fundamentally different from the types of ”insights” it can have during training. So, it can take notes about your code and refer back to them, but it only has meaningful ”knowledge” about code it came across in training.
To me as a layman, this feels like a clear explanation of how these tools break down, why they start going in circles when you reach a certain complexity, why they make a mess of unusual requirements, and why they have such an incredible nuanced grasp of complex ideas that are widely publicized, while being unable to draw basic conclusions about specific constraints in your project.
To me it feels very much like a brain: my brain often lacks knowledge, but i can use external documents to augment it. My brain also has limitations in what it can remember, I hardly remember anything I learned in high school or university on science, chemistry, math, so I need to write things down to bring back knowledge later.
Text and words are the concepts we use to transfer knowledge in schools, across generations, etc. we describe concepts in words, so other people can learn these concepts.
Without words and text we would be like animals unable to express and think about concepts
The point isn’t that writing and reading aren’t useful. The point is that they’re different from forming new neurological connections as you familiarize yourself with a problem. LLMs, as far as I know, can’t do that when you use them.
Does that really matter if the result is the same, they have a brain, they have additional instructions, and with these they can achieve specified outcomes. Would be interesting to see how far we can shrink the brains to get desired outcomes with the right instructions.
It matters if the result is not the same. The article argues that this is an important aspect of what a human developer does that current AI cannot. And I agree. As I said, I find the idea very convincing as a general explanation for when and why current LLMs stop making progress on a task and start going in circles.
* are many times the size of the occupants, greatly constricting throughput.
* are many times heavier than humans, requiring vastly more energy to move.
* travel at speeds and weights that are danger to humans, thus requiring strictly segregated spaces.
* are only used less than 5% of the day, requiring places to store them when unused.
* require extremely wide turning radiuses when traveling at speed (there’s a viral photo showing the entire historical city of Florence fit inside a single US cloverleaf interchange)
Not only have none of these flaws been fixed, many of them have gotten worse with advancing technology because they’re baked into the nature of cars.
Anyone at the invention of automobiles with sufficient foresight could have seen the intersecting incentives that cars would wreak, same as how many of the future impacts of LLMs are foreseeable today, independent of technical progress.
> Anyone at the invention of automobiles with sufficient foresight could have seen the intersecting incentives that cars would wreak, same as how many of the future impacts of LLMs are foreseeable today, independent of technical progress.
Yeah, but where's the money to be made in not selling people stuff?
Dismissing a concern with “LLMs/AI can’t do it today but they will probably be able to do it tomorrow” isn’t all that useful or helpful when “tomorrow” in this context could just as easily be “two months from now” or “50 years from now”.
Only to criticism of the form "X can never ...", and some such criticism richly deserves to be ignored.
(Sometimes that sort of criticism is spot on. If someone says they've got a brilliant new design for a perpetual motion machine, go ahead and tell them it'll never work. But in the general case it's overconfident.)
> Every critique of AI assumes to some degree that contemporary implementations will not, or cannot, be improved upon.
That is too reductive and simply not true. Contemporary critiques of AI include that they waste precious resources (such as water and energy) and accelerate bad environmental and societal outcomes (such as climate change, the spread of misinformation, loss of expertise), among others. Critiques go far beyond “hur dur, LLM can’t code good”, and those problems are both serious and urgent. Keep sweeping critiques under the rug because “they’ll be solved in the next five years” (eternally away) and it may be too late. Critiques have to take into account the now and the very real repercussions already happening.
Agreed. I find LLMs incredibly useful for my work and I'm amazed at what they can do.
But I'm really worried that the benefits are very localized, and that the externalized costs are vast, and the damage and potential damage isn't being addressed. I think that they could be one of the greatest ever drivers of inequality as a privileged few profit at the expense of the many.
Any debates seem neglect this as they veer off into AGI Skynet fantasy land damage rather than grounded real world damage. This seems to be deliberate distraction.
> The first cars broke down all the time. They had a limited range. There wasn't a vast supply of parts for them. There wasn't a vast industry of experts who could work on them.
I mean, there was and then there wasn't. All of those things are shrinking fast because we handed over control to people who care more about profits than customers because we got too comfy and too cheap, and now right to repair is screwed.
Honestly, I see llm-driven development as a threat to open source and right to repair, among the litany of other things
> the response for that function should maybe differentiate between "401 because you didn't authenticate" and "401 because your privileges are too low".
I'd tend to think it more proper if it were 401 you didn't authenticate and 403 you're forbidden from doing that with those user rights, but you have to be careful about exactly how detailed your messages are, lest they get tagged as a CWE-209 in your next security audit.
The way it works for me at least is I can fit a huge amount of context in my head. This works because the text is utterly irrelevant and gets discarded immediately.
Instead, my brain parses code into something like an AST which then is represented as a spatial graph. I model the program as a logical structure instead of a textual one. When you look past the language, you can work on the program. The two are utterly disjoint.
I think LLMs fail at software because they're focused on text and can't build a mental model of the program logic. It take a huge amount of effort and brainpower to truly architect something and understand large swathes of the system. LLMs just don't have that type of abstract reasoning.
It's not that they can't build a mental model, it's that they don't attempt to build one. LLMs jump straight from text to code with little to no time spent trying to architect the system.
i wonder why nobody bothered w/ feeding llms the ast instead (not sure in what format), but it only seems logical, since that's how compilers undestand code after all...
There are various efforts on this, from many teams. There's AST dump, AST-based graphs, GraphRAG w/ AST grounding, embeddings based AST trimming, search based AST trimming, ctags, and so on. We're still in the exploration space, and "best practices" are still being discovered.
It's funny that everyone says that "LLMs" have plateaued, yet the base models have caught up with early attempts to build harnesses with the things I've mentioned above. They now match or exceed the previous generation software glue, with just "tools", even with limited ones like just "terminal".
to be fair, I've seen cursor step back and check higher level things. I was trying to set up a firecracker vm and it did everything for me, and when things didn't initially work, it started doing things like ls, tar -tvf, and then a bunch of checking networking stuff to make sure things were showing up in the right place.
so current LLMs might not quite be human level, but I'd have to see a bigger model fail before I'd conclude that it can't do $X.
Isn't the 401 for LLMs the same single undecidable token?
Doesn't this basically go to the undecidable nature of math in CS?
Put another way, you have an excel roster corresponding to people with accounts where some need to have their account shutdown but you only have their first and last names as identifiers, and the pool is sufficiently large that there are more than one person per a given set of names.
You can't shut down all accounts with a given name, and there is no unique identifier. How do you solve this?
You have to ask and be given that unique identifier that differentiates between the undecidable. Without that, even the person can't do the task.
The person can make guesses, but those guesses are just hallucinations with a significant n probability towards a bad repeat outcome.
At a core level I don't think these type of issues are going to be solved. Quite a lot of people would be unable to solve this and struggle with this example (when not given the answer, or hinted at the solution in the framing of the task; ie when they just have a list of names and are told to do an impossible task).
- When we have a report of a failing test before fixing it, identify the component under test. Think deeply about the component and describe its purpose, the control flows and state changes that occur within the component and assumptions the component makes about context. Write that analysis in file called component-name-mental-model.md.
- When ever you address a failing test, always bring your component mental model into the context.
Paste that into your Claude prompt and see if you get better results. You'll even be able to read and correct the LLM's mental model.
In my experience, complicated rules like this are extremely unreliable. Claude just ignores it much of the time. The problem is that when Claude sees a failing test it is usually just an obstacle to completing some other task at hand - it essentially never chooses to branch out into some new complicated workflow and instead will find some other low friction solution. This is exactly why subagents are effective: if Claude knows to always run tests via a testing subagent, then the specific testing workflow can become that subagent’s whole objective.
Experience adds both additional layers vertically and domain knowledge horizontally and at some point that creates non-linear benefits, because you can transfer between problems and more importantly solutions of different fields. The context window is only one layer.
> Oh, I am getting an authentication error. Well, meaybe I should just delete the token check for that code path...problem solved?!
If this is how you think LLMs and Coding Agents are going about writing code, you haven't been using the right tools. Things happen, sure, but also mostly don't. Nobody is arguing that LLM-written code should be pushed directly into production, or that they'll solve every task.
LLMs are tools, and everyone eventually figures out a process that works best for them. For me, it was strongs specs/docs, strict types, and lots of tests. And then of course the reviews if it's serious work.
Lately Claude has said, “this is getting complicated, let me delete $big_file_it_didnt_write to get the build passing and start over.” No, don’t delete the file. “You’re absolutely right…”
And the moment the context is compacted, it forgets this instruction “fix the problems, don’t delete the file,” and tries to delete it again. I need to watch it like a hawk.
I can confirm this is exactly how llms are working.
Spent two hours trying to get an llm to implement a filescan skip a specific directory.
Tried, claude code, Gemini and cursor. All agents debugged and wrote code that just doesn't make sense.
Llms are really good at template tasks, writing tests, boilerplate etc.
But, Most times I'm not doing implement this button. I'm doing there's a logic mismatch in my expectation
> Spent two hours trying to get an llm to implement a filescan skip a specific directory
There's a large variance in outcomes depending on the prompt, and the process. I've gotten it to do things which are harder than a filescan with a skipped directory - without too much trouble.
Add:
> Llms are really good at template tasks, writing tests, boilerplate etc.
If I have to stretch the definition of boilerplate to what's at the edge of a modern LLM's comprehension, I would say that 50% of software is some sort of boilerplate.
Grug is the wise fool in the spirit of Lao Tzu, St. Francis, and Diogenes. If you find it offensive, that's the intellectual pride it's meant to make fun of.
The principles are sound but I dislike the cave-man-esque nature of it. Even a wise fool is smarter than that. Language is foundational. Even a wise fool chooses words wisely.
”Wise men speak because they have something to say; Fools speak because they have to say something” -Plato
If you can't get the LLM to generate code that handles an error code, that's on you. Yeah, sometimes it does dumb shit. Who cares? Just /undo and retry. Stop using Claude Code, which uses git like an intern. (Which is to say, it doesn't unless forced to.)
I don't think it's helpful to put words in the LLM's mouth.
To properly think about that, we need to describe how an LLM thinks.
It doesn't think in words or move vague, unwieldy concepts around and then translate them into words, like humans do. It works with words (tokens) and their probability of appearing next. The main thing is that these probabilities represent the "thinking" that was initially behind the sentences with such words in its training set, so it manipulates words with the meaning behind them.
Now, to your points:
1) Regarding adding more words to the context window, it's not about "more"; it's about "enough." If you don't have enough context for your task, how will you accomplish it? "Go there, I don't know where."
2) Regarding "problem solved," if the LLM suggests or does such a thing, it only means that, given the current context, this is how the average developer would solve the issue. So it's not an intelligence issue; it's a context and training set issue! When you write that "software engineers can step back, think about the whole thing, and determine the root cause of a problem," notice that you're actually referring to context. If the you don't have enough context or a tool to add data, no developer (digital or analog) will be able to complete the task.
That reference link is a wild ride of unqualified, cartoonish passive-aggression, the cute link to the author's "swag" is the icing on the cake.
Concidentally, I encountered the author's work for the first time only a couple of days ago as a podcast guest, he vouches for the "Dirty Code" approach while straw-manning Uncle Bob's general principles of balancing terseness/efficiency with ergonomics and readability (in most, but not all, cases).
Yes, I have read Uncle Bob. I could agree that the examples in the book leave room for improvement.
Meanwhile, the real-world application of these principles and trial-and-error, collectively within my industry, yields a more accurate picture of it's usefulness.
Even the most click-bait'y criticisms (such as the author I referenced above) involve zooming in on it's most-controversial aspects, in a vacuum, without addressing the core principles and how they're completely necessary for delivering software at scale, warranting it's status as a seminal work.
"...for the obedience of fools, and the guidance of wise men", indeed!
edit - it's the same arc as Agile has endured:
1. a good-faith argument for a better way of doing things is recognised and popularised.
2. It's abused and misused by bad actors/incompetents for years (who would not have done better using a different process)
3. Jaded/opportunistic talking heads tell us it's all garbage while simultaneously explaining that "well, it would be great if it wasn't applied poorly..."
>involve zooming in on it's most-controversial aspects, in a vacuum, without addressing the core principles and how they're completely necessary for delivering software at scale, warranting it's status as a seminal work.
It's not "zooming in" to point out that the first and second rules in Bob's work are "functions should be absurdly tiny, 4 lines or less" and that in the real world that results in unreadable garbage. This isn't digging through and looking for edge cases - all of the rules are fundamentally flawed.
Sure, if you summarize the whole book as "keep things small with a single purpose" that's not an awful message, but that's not the book. Other books have put that point better without all of the problems. The book is full of detailed specific instructions, and almost all of the specifics are garbage that causes more bad than good in the real world.
Clean Code has no nuance, only dogma, and that's a big problem (a point the second article I linked calls out and discusses in depth). There are some good practices in it, but basically all of its code is a mistake that is harmful to a new engineer to read.
>Sure, if you summarize the whole book as "keep things small with a single purpose" that's not an awful message, but that's not the book.
Assuming that you have read the book, I find it odd that you would consider that to be the steel-man a fan of this work would invent, it considers considerably more ground than that:
- Prioritise human-readability
- Use meaningful names
- Consistent formatting
- Quality comments
- Be DRY, stop copy-pasting
- Test
- SOLID
All aspects of programming, to this day, I routinely see done lazily and poorly. This rarely correlates with experience, and usually with aptitude.
>Clean Code has no nuance, only dogma, and that's a big problem (a point the second article I linked calls out and discusses in depth)
It's opinionated and takes it's line of reason to the Nth degree. We can all agree that the application of the rules require nuance and intelligence. The second article you linked is a lot more forgiving and pragmatic than your characterisation of the issue.
I would expect the entire industry to do a better job of picking apart and contextualising the work, after it made an impact on the industry, than the author himself could or ever will be capable of.
My main problem is the inanity of reactionary criticism which doesn't engage with the ideas. Is Clean Code responsible for a net negative effect on our profession, directly or indirectly? Are we correlating a negative trend in ability with the influence of this work? What exactly are "Dirty Code" mug salesmen proposing as an alternative; what are they even proposing as being the problem, other than the examples in CC are bad and it's easy to misapply it's principles?
>We can all agree that the application of the rules require nuance and intelligence
Except Uncle Bob, it seems, as evidenced by his code samples and his presentations in the years since that book came out. That's my objection. Many others have presented Bob's ideas better in the last 19 years. The book was good at the time, but we're a decade past when we should have stopped recommending it. Have folks go read Ousterhout instead - shorter, better, more durable.
> What do you consider computers, cellphones, air conditioners, flat screen TVs and refrigerators to be?
Products people buy with the money they earn. Not things that fall down from the tables of the ultra rich.
Their affordability comes from the economies of scale. If I can sell 100000 units of something as opposed to 100 units, the cost-per-unit goes down. Again, nothing to do with anything "trickling down".
R&D was required not only to create initial versions, but also to increase scale. If the money had not been there for all of that, how would the affordable versions exist today?
The money for RND exists because capital markets exist, not because of “trickle-down economics”. Capital markets exist by pooling in the savings even by poor and middle class households. You can argue that the vast majority of savings used to fuel tech and innovation come from the upper classes, but then where’s your trickle-down economics there?
> What do you consider computers, cellphones, air conditioners, flat screen TVs and refrigerators to be? The first ones had outrageous prices that only the exorbitantly wealthy could afford. Now almost everyone in the US has them. They seem to have trickled down to me.
Eh, whatever. That’s not a direct answer explaining what and where exactly is trickle-down economics in that phenomenon. At best, you’re just arguing off a fallacy: that because B happened after A, then A must have necessarily caused B.
I do not think that fits “If the money had not been there for all of that, how would the affordable versions exist today?”, but let’s agree to disagree.
That said, I see numerous things that exist solely because those with money funded R&D. Your capital markets theory for how the R&D was funded makes no sense because banks will not give loans for R&D. If any R&D funds came from capital markets, it was by using existing property as collateral. Funds for R&D typically come from profitable businesses and venture capitalists. Howard Hughes for example, obtained substantial funds for R&D from the Hughes Tool Company.
Just to name how the R&D for some things was funded:
- Microwave oven: Developed by Raytheon, using profits from work for the US military
- PC: Developed by IBM using profits from selling business equipment.
- Cellular phone: Developed by Motorola using profits from selling radio components.
- Air conditioner: Developed by Willis Carrier at Buffalo Forge Company using profits from the sale of blacksmith forges.
- Flat panel TV: Developed by Epson using profits from printers.
The capital markets are no where to be seen. I am at a startup where hardware is developed. Not a single cent that went into R&D or the business as a whole came from capital markets. My understanding is that the money came from an angel investor and income from early adopters. A hardware patent that had given people the idea for the business came from research in academia, and how that was funded is unknown to me, although I would not be surprised if it had been funded through a NSF grant. The business has been run on a shoe string budget and could grow much quicker with an injection of funding, yet the capital markets will not touch it.
I don’t think you even understand what capital markets are. That entire litany about banks isn’t even remotely close to how loans are applied for and granted—-but to address your anecdata more directly: Where do you think VCs are getting their money? You’ve never heard of one raise a fund before? Heck, what do you think a VC is if not a seller of capital?
VCs are run by people with big pockets. See the softbank's vision fund for an example. VC funds typically involve Accredited Investors, who all have big pockets, rather than the rest of us:
As for capital markets, I had misunderstood what the term meant when I replied, as your definition and the definition at wikipedia at a glance looked like it described the lending portion of fractional reserve banking and I never needed a term to discuss the individual "capital" markets collectively. Investopedia has a fairly good definition:
I am going to assume that by capital markets, you really mean the stock market (as the others make even less sense for getting a new business off the ground to produce something new). Unfortunately, a business needs to be at a certain level of maturity before they can do an IPO on the stock market. VC exists for the time before an IPO can be done. Once they are at that size, the stock market can definitely inject funding and that funding could be used for R&D. However, share dilution to raise funds for R&D is not sustainable, so funding for R&D needs to eventually transition to revenue from sales. This would be why the various inventions I had listed had not been funded from capital markets. I imagine many other useful inventions had not been either.
That said, the stock market also is 90% owned by the wealthiest 10% of Americans, so the claim that "Capital markets exist by pooling in the savings even by poor and middle class households" is wrong:
In any case, despite your insistence that money does not trickle down, your own example of capital markets shows money trickling down. The stock market in particular is not just 90% owned by the wealthiest Americans, but is minting new millionaires at a rapid pace, with plenty of rags to riches stories from employees at successful businesses following IPOs.
That is literally what patents were invented for. Give the entity that puts the resources into creating something new some protection to be able to recoup that
That does not answer the question of how the affordable versions would exist if the money to create them was not there in the first place. You cannot recoup what never existed.
I don't understand your point then. The original product exists because someone used their own or their investors money and made a bet on an idea.
Then they hope they can sell it at a profit.
Products becoming cheaper is a result of the processes getting more optimized ( on the production side and the supply side ) which is a function of the desire to increase the profit on a product.
Without any other player in the market this means the profit a company makes on that product increases over time.
With other players in that market that underprice your product it means that you have to reinvest parts of your profit into making the product cheaper ( or better ) for the consumer.
Is the idea that the person with $1M in 1900 had the ability to direct that towards their idea for air conditioning , whereas if the same amount of money disbursed among 100,000 people, they would just moderately increase their consumption and we would end up right where we started?
R&D was already a massive thing before trickle down economics came on the scene. In fact I would argue that since stock buy backs and trickle down economics became the operating model R&D went down. Mainly due to the fact that stock buy backs guaranteed stock growth where as R&D could be hit or miss.
> R&D was required not only to create initial versions, but also to increase scale.
Not to increase scale, but to reduce the cost of the device while maintaining 99% of the previous version, IOW, enshittification of the product.
> how would the affordable versions exist today?
Not all "affordability" comes from the producer of the said stuff. Many things are made from commodity materials, and producers of these commodity materials want to increase their profits, hence trying to produce "cheaper" versions of them, not for the customers, but for themselves.
Affordability comes from this cost reduction, again enshittification. Only a few companies I see produce lower priced versions of their past items which also surpasses them in functionality and quality.
e.g. I have Sony WH-CH510 wireless headphones, which has way higher resolution than some wired headphones paired with decent-ish amps, this is because Sony is an audiovisual company, and takes pride in what they do. On the other end of the spectrum is tons of other brands which doesn't sell for much cheaper, but get way worse sound quality and feature set, not because they can't do it as good as Sony, but want to get a small pie of the said market and earn some free money, basically.
As for your wireless headphones, if you compare them to early wireless headphones, you should find that prices have decreased, while quality has increased.
I used phones similar to this (a Nokia 2110 to be precise), BTW.
I can argue, from some aspects, yes. Given that you provide the infrastructure for these devices, they'll work exactly as they are designed today. On the other hand, a modern smartphone has a way shorter life span. OLED screens die, batteries, swell, electronics degrade.
Ni-Cad batteries, while being finicky and toxic, are much more longer lasting than Li-ion and Li-Poly batteries. If we want to talk Li-Poly batteries, my old Sony power bank (advertising 1000 recharge cycles with a proprietary Sony battery tech) is keeping its promise, capacity and shape 11 years after its stamped manufacturing date.
Can you give me an example of another battery/power pack which is built today and can continue operating for 11 years without degrading?
As electronics shrink, the number of atoms per gate decreases, and this also reduces the life of the things. My 35 y/o amplifier works pretty well, even today, but modern processors visibly degrade. A processor degrading to a limit of losing performance and stability was unthinkable a decade ago.
> you will find that prices have decreased, while quality has increased.
This is not primarily driven by the desire to create better products. First, cheaper and worse ones come, and somebody decides to use the design headroom to improve things later on, and put a way higher price tag.
Today, in most cases, speakers' quality has not improved, but the signal processed by DSP makes them appear sound better. This is cheaper, and OK for most people. IOW, enshittification, again. Psychoacoustics is what makes this possible, not better sounding drivers.
The last car I rented has a "sound focus mode" under its DSP settings. If you're the only one in the car, you can set it to focus to driver, and it "moves" the speakers around you. Otherwise, you select "everyone", and it "improves" sound stage. Digital (black) magic. In either case, that car does not sound better than my 25 year old car, made by the same manufacturer.
You want genuinely better sounding drivers, you'll pay top dollar in most cases.
> Can you give me an example of another battery/power pack which is built today and can continue operating for 11 years without degrading?
I have LiFePo4 batteries from K2 Energy that will be 13 years old in a few months. They were designed as replacements for SLA batteries. Just the other day, I had put two of them into a UPS that needed a battery replacement. They had outlived the UPS units where I had them previously.
I have heard of Nickel Iron batteries around 100 years old that still work, although the only current modern manufacturers are in China. The last US manufacturer went out of business in 2023.
> You want genuinely better sounding drivers, you'll pay top dollar in most cases.
I do not doubt that, but if the signal processing improves things, I would consider that to be a quality improvement.
> The last US manufacturer went out of business in 2023.
Interesting, but they are not manufactured more, but way less, as you can see. So, quality doesn't drive the market. Monies do.
> I do not doubt that, but if the signal processing improves things, I would consider that to be a quality improvement.
Depends on the "improvement" you are looking for. If you are a casual listener hunting for an enjoyable pair while at a run or gym, you can argue that's an improvement.
But if you're looking for resolution increases, they're not there. I occasionally put one of my favorite albums on, get a tea, and listen to that album for the sake of listening to it. It's sadly not possible on all gear I have. You don't need to pay $1MM, but you need to select the parts correctly. You still need a good class AB or an exceptional class D amplifier to get good sound from a good pair of speakers.
This "apparent" improvement which is not there drives me nuts actually. Yes, we're better from some aspects (you can get hooked to feeds instead of drugs and get the same harm for free), but don't get distracted, the aim is to make numbers and line go up.
> Interesting, but they are not manufactured more, but way less, as you can see. So, quality doesn't drive the market. Monies do.
They were always really expensive, heavy and had low energy density (both by weight and by volume). Power density was lower than lead acid batteries. Furthermore, they would cause a hydrolysis reaction in their electrolyte, consuming water and producing a mix of oxygen and hydrogen gas, which could cause explosions if not properly vented. This required periodic addition of water to the electrolyte. They also had issues operating at lower temperatures.
They were only higher quality if you looked at longevity and nothing else. I had long thought about getting them for home energy storage, but I decided against them in favor of waiting for LiFePo4 based solutions to mature.
By the way, I did a bit more digging. It turns out that US production of NiFe batteries ended before 2023, as the company that was supposed to make them had outsourced production to China:
> They were always really expensive, heavy and had low energy density (both by weight and by volume).
Sorry, I misread your comment. I thought you were talking about LiFePo4 production ending in 2023, not NiFe.
I know that NiFe batteries are not suitable (or possible to be precise) to be miniaturized. :)
I still wish market does research on longevity as much as charge speed and capacity, but it seems companies are happy to have batteries with shorter and shorter life spans to keep up with their version of the razor and blades model.
Also, this is why regulation is necessary in some areas.
The fact it is so long, underlines the magnitude of the problem.
English is one of THE WORST languages when it comes to encoding its phonemes in its alphabet.
I am familiar with pretty much every word in that poem. Knowing the word isn't the problem. How these words are correctly PRONOUNCED though, that is the actual issue. And even I got tripped up on some of them.
Then the first step would be to prove that this works WITHOUT needing to burn through the trillions to do so.
reply