I can only hope that Cerebras is able to keep their first party inference product going. It’s incredible to run a strong model at interactive latencies for whole results. Routinely less than seconds to product entire files / documents / outputs / …
How do you determine flawlessness? How you even approximate a guarantee of it? To what specification is flawlessness judged, and can you precisely and completely relay that spec to your friendly local robot more efficiently than it can vendor / import existing libraries?
It’ll just spit the code out. I vibe coded some with cookie handling the other day that worked. Should I have done it? Nope. But the ai did it and I allowed it
The concept of using a library for everything will become outdated
It read the library and created a custom implementation for my use case. The implementation was interoperable with a popular nextjs with library. It was a hack sure, but it also took me three minutes
The value of a library is not just that it does a thing you want, but that it doesn’t do all the things you’d prefer it didn’t.
It’s easy to write a cookie parser for a simple case; clearly your robot was able to hand you one for millidollars. How confident are you that you’ve exhaustively specified the exact subset of situations your code is going to encounter, so the missing functionality doesn’t matter? How confident are you that its implementation doesn’t blow up under duress? How many tokens do you want to commit to confirming that (reasoning, test, pick your poison)?
I mean ASI could just generate the pixels of a webpage at 60hz. And who needs cookies if ASI knows who you are after seeing half a frame of the dust on your wall, and then knows all the pixels and all the clicks that have transpired across all your interactions with its surfaces. And who needs webpages when ASI can conjure anything on a screen and its robots and drones can deliver any earthly desire at the cost of its inputs. That is if we’re not all made into paper clips or dead fighting for control of the ASML guild at that point.
I say all that mostly in jest, but to take your point somewhat seriously: the most accurate thing you’ve said is that no one is ready for super intelligence. What a great and terrible paroxysm it would be.
A stack of 38x 1U and a switch does not a cloud make.
Slightly less pithy: they're selling rack-scale systems, with power, hardware, network, and control plane software all integrated. Something that presents to the user as something more like API to interact with than a pile of servers to be managed.
No, your expectations are not wrong. I'm a small business. A fully stacked AI/GPU cabinet is multiples of this. A single GH200 based server will have 2x2.7 kW power supplies in a 1U form factor. As you can imagine, I am not running a cabinet full of such servers. But you don't need AI power requirements to do normal software. And there's lots of normal software to do!
Having worked in the space, 6kVA is the norm from 10-15 years ago, 12kVA is the standard for regular compute workloads. With HPC/AI all bets are off though.
Tesla Roadster took a bunch of preorders at $50-250k down almost a decade ago, More recently, Taycan did reasonable-ish volume at $100-200k/unit. There (at least once was) a market for such things. Its definitely not the same market as ICE super/hypercars, but there are some that might enjoy a silent, luxurious car with a sub-2 0-60 as a complement to other cars in the garage.
A natural question to ask in response, but I did not find out in each case in a way where I should say anything about their engineering so since I've said something about that I can't say who. I get how it comes off. If the Criterion of Embarrassment helps, I find myself sorely unqualified for this new world.
To be honest, I shouldn't have said anything in the first place. It isn't useful as it is because you can't reasonably believe me. I just feel blindsided by the stuff that's working now.
Perhaps we'll go the way of the Space Shuttle? One group writes a highly-structured, highly-granular, branch-by-branch 2500 page spec, and another group (LLM) writes 25000 lines of code, then the first group congratulates itself on on producing good software without have to write code?
Code written in a HLL is a sufficient[1] description of the resulting program/behavior. The code, in combination with the runtime, define constraints on the behavior of the resulting program. A finished piece of HLL code encodes all the constraints the programmer desired. Presuming a 'correct' compiler/runtime, any variation in the resulting program (equivalently the behavior of an interpreter running the HLL code) varies within the boundaries of those constraints.
Code in general is also local, in the sense that small perturbation to the code has effects limited to a small and corresponding portion of the program/behavior. A change to the body of a function changes the generated machine code for that function, and nothing else[2].
Prompts provided to an LLM are neither sufficient nor local in the same way.
The inherent opacity of the LLM means we can make only probabilistic guarantees that the constraints the prompt intends to encode are reflected by the output. No theory (that we now know) can even attempt to supply such a guarantee. A given (sequence of) prompts might result in a program that happens to encode the constraints the programmer intended, but that _must_ be verified by inspection and testing.
One might argue that of course an LLM can be made to produce precisely the same output for the same input; it is itself a program after all. However, that 'reproducibility' should not convince us that the prompts + weights totally define the code any more than random.Random(1).random() being constant should cause us to declare python's .random() broken. In both cases we're looking at a single sample from a pRNG. Any variation whatsoever would result in a different generated program, with no guarantee that program would satisfy the constraints the programmer intended to encode in the prompts.
While locality falls similarly, one might point out the an agentic LLM can easily make a local change to code if asked. I would argue that an agentic LLMs prompts are not just the inputs from the user, but the entire codebase in its repo (if sparsely attended to by RAG or retrieval tool calls or w/e). The prompts _alone_ cannot be changed locally in a way that guarantees a local effect.
The prompt LLM -> program abstraction presents leaks of such volume and variety that it cannon be ignored like the code -> compiler -> program abstraction can. Continuing to make forward progress on a project requires the robot (and likely the human) attend to the generated code.
Does any of this matter? Compilers and interpreters themselves are imperfect, their formal verification is incomplete and underutilized. We have to verify properties of programs via testing anyway. And who cares if the prompts alone are insufficient? We can keep a few 100kb of code around and retrieve over it to keep the robot on track, and the human more-or-less in the loop. And if it ends up rewriting the whole thing every few iterations as it drifts, who cares?
For some projects where quality, correctness, interoperability, novelty, etc don't matter, it might be. Even in those, defining a program purely via prompts seems likely to devolve eventually into aggravation. For the rest, the end of software engineering seems to be greatly exaggerated.
[2]: there're of course many tiny exceptions to this. we might be changing a function that's inlined all over the place; we might be changing something that's explicitly global state; we might vary timing of something that causes async tasks to schedule in a different order etc etc. I believe the point stands regardless.
https://cloud.cerebras.ai/
reply