$10B of compute credits on a capped profit deal that they can break as soon as they get AGI (i.e. the $10T invention) seems pretty favorable to OpenAI.
I’d be significantly less surprised if OpenAI never made a single $ in profit than if they somehow invented “AGI” (of course nobody has a clue what that even means so maybe there is a chance just because of that..)
Leaving aside the “AGI on paper” point a sibling correctly made, your point shares the same basic structure as noting that any VC investment is a terrible deal if you only 2x your valuation. You might get $0 if there is a multiple on the liquidation preference!
OpenAI are clearly going for the BHAG. You may or may not believe in AGI-soon but they do, and are all in on this bet. So they simply don’t care about the failure case (ie no AGI in the timeframe that they can maintain runway).
As the article states, YC’s key asset is their deal flow. You simply don’t get that if you only write 1% of the best seed checks.
If you don’t eat the pie, someone else will. If someone else eats more pie than you, they will be stronger than you and eat your pie.
Switching analogies, unless you have a defensible niche with a wide moat, your position of power is an unstable equilibrium; in the open market only a monopoly is a stable attractor state.
The advantage of your position is that being a tech lead gives you a lot of ownership and responsibility, and that is one of the best ways to learn a lot in a short time.
The disadvantage is that it’s hard to grow beyond what you can visualize, so past a certain point you’ll likely be growing slower than you would working under someone with 10-20 more years of experience.
I would advise one of two paths: either stick around a year or two u til you can confidently own “tech lead” on your resume and move on, or commit to a longer stint and really reap the learning benefits of the technical leadership position.
So much for the unsolicited career advice :). To answer the direct question, a few things I found useful as a startup CTO:
- Follow the news for your tech stack. Stay up to date with the various technologies. Go to meetups.
- Think about the long-term development of your system. Spend 1% of your time thinking about where you might be in a year or two, and do some experiments to explore those paths. Your codebase is the best place for you to experiment with new tools as it’s a real-world system with the warts that reality brings. You don’t have to merge these experiments! As CTO I had probably 20% of my PRs as RFC / experimental work. (I do not advise contributing to FOSS for learning in your scenario. Feel free to do so if you enjoy it though.)
- keep an Architecture Decision Log. Review the big decisions you made in hindsight, and learn the lessons that you can only learn if you own a system for multiple years. Eg this DB choice seemed good and bought us a year of scale reprieve, but then required a 2 man-year rewrite down the road. Knowing the true value of “I” will make you much better at judging tradeoffs in ROI.
- Beware burnout. If you are studying and leisure coding in your spare time you will eventually run the tank dry. It might take 5 years or it might take less. Just be mindful.
- Don’t be terrified of burnout. The tech lead position is a unique opportunity to feel ownership of a system and have the autonomy to try things out. It’s a role where pushing harder can yield more leverage than a normal position. So putting in some extra hours can pay off in the long run for your career.
In all the systems I’ve built (mostly Django) you need to tolerate vN and vN+1 simultaneously; you are not going to turn off your app to upgrade the DB.
You’ll have some Pods on the old application version while you do your gradual upgrade.
How do you envision rolling upgrades working here?
You’re missing the point being made in this thread, which is that there might be subtle long-term impairments to the genetic condition described in the OP.
Indeed the article discusses this thoroughly, noting that since it’s a very small sample you can’t rule out anything but very strongly negative fitness impact.
There’s simply not enough data to rule out the hypothesis that folks with this condition are slightly sleep-deprived vs their theoretical without-mutation genotype baseline.
This is the American model of anti-trust, very much not the European model (which explicitly targets competition for its own sake even when consumers are not harmed by the monopolistic behavior).
> Making some illegal moves doesn’t invalidate the demonstrated situational logic intelligence
That’s exactly what it does. 1 illegal move in 1 million or 100 million or any other sample size you want to choose means it doesn’t understand chess.
People in this thread are really distracted by the medical analogy so I’ll offer another: you’ve got a bridge that allows millions of vehicles to cross, and randomly falls down if you tickle it wrong, maybe a car of rare color. One key aspect of bridges is that they work reliably for any vehicle, and once they fail they don’t work with any vehicle. A bridge that sometimes fails and sometimes doesn’t isn’t a bridge as much as a death trap.
Humans with correct models may nevertheless make errors in rule applications. Machines are good at applying rules, so when they fail to apply rules correctly, it means they have incorrect, incomplete, or totally absent models.
Without using a word like “understands” it seems clear that the same apparent mistake has different causes.. and model errors are very different from model-application errors. In a math or physics class this is roughly the difference between carry-the-one arithmetic errors vs using an equation from a completely wrong domain. The word “understands” is loaded in discussion of LLMs, but everyone knows which mistake is going to get partial credit vs zero credit on an exam.
>Humans with correct models may nevertheless make errors in rule applications.
Ok
>Machines are good at applying rules, so when they fail to apply rules correctly, it means they have incorrect or incomplete models.
I don't know why people continue to force the wrong abstraction. LLMs do not work like 'machines'. They don't 'follow rules' the way we understand normal machines to 'follow rules'.
>so when they fail to apply rules correctly, it means they have incorrect or incomplete models.
Everyone has incomplete or incorrect models. It doesn't mean we always say they don't understand. Nobody says Newton didn't understand gravity.
>Without using a word like “understands” it seems clear that the same apparent mistake has different causes.. and model errors are very different from model-application errors.
It's not very apparent no. You've just decided it has different causes because of preconceived notions on how you think all machines must operate in all configurations.
LLMs are not the logic automatons in science fiction. They don't behave or act like normal machines in any way. The internals run some computations to make predictions but so does your nervous system. Computation is substrate-independent.
I don't even know how you can make this distinction without seeing what sort of illegal moves it makes. If it makes the sort high rated players make then what ?
I can’t tell if you are saying the distinction between model errors and model-application errors doesn’t exist or doesn’t matter or doesn’t apply here.
- Generally, we do not say someone does not understand just because of a model error. The model error has to be sufficiently large or the model sufficiently narrow. No-one says Newton didn't understand gravity just because his model has an error in it but we might say he didn't understand some aspects of it.
- You are saying the LLM is making a model error (rather than an an application error) only because of preconceived notions of how 'machines' must behave, not on any rigorous examination.
Suppose you're right, the internal model of game rules is perfect but the application of the model for next-move is imperfect. Unless we can actually separate the two, does it matter? Functionally I mean, not philosophically. If the model was correct, maybe we could get a useful version of it out by asking it to write a chess engine instead of act as a chess engine. But when the prolog code for that is as incorrect as the illegal chess move was, will you say again that the model is correct, but the usage of it resulted merely resulted in minor errors?
> You are saying the LLM is making a model error (rather than an an application error) only because of preconceived notions of how 'machines' must behave, not on any rigorous examination.
Here's an anecdotal examination. After much talk about LLMs and chess, and math, and formal logic here's the state of the art, simplified from dialog with gpt today:
> blue is red and red is blue. what color is the sky?
>> <blah blah, restates premise, correctly answer "red">
At this point fans rejoice, saying it understands hypotheticals and logic. Dialogue continues..
> name one red thing
>> <blah blah, restates premise, incorrectly offers "strawberries are red">
At this point detractors rejoice, declare that it doesn't understand. Now the conversation devolves into semantics or technicalities about prompt-hacks, training data, weights. Whatever. We don't need chess. Just look it, it's broken as hell. Discussing whether the error is human-equivalent isn't the point either. It's broken! A partially broken process is no solid foundation to build others on. And while there are some exceptions, an unreliable tool/agent is often worse than none at all.
>It's broken! A partially broken process is no solid foundation to build others on. And while there are some exceptions, an unreliable tool/agent is often worse than none at all.
Are humans broken ? Because our reasoning is a very broken process. You say it's no solid foundation ? Take a look around you. This broken processor is the foundation of society and the conveniences you take for granted.
The vast vast majority of human history, there wasn't anything even remotely resembling a non-broken general reasoner. And you know the funny thing ? There still isn't. When people like you say LLMs don't reason, they hold them to a standard that doesn't exist. Where is this non-broken general reasoner in anywhere but fiction and your own imagination?
>And while there are some exceptions, an unreliable tool/agent is often worse than none at all.
Since you are clearly meaning unreliable to be 'makes no mistake/is not broken' then no human is a reliable agent.
Clearly, the real exception is when an unreliable agent is worse than nothing at all.
This feels more like a metaphysical argument about what it means to "know" something, which is really irrelevant to what is interesting about the article.
Replace the word with one of your own choice if that will help us get to the part where you have a point to make?
I think we are discussing whether LLMs can emulate chess playing machines, regardless of whether they are actually literally composed of a flock of stochastic parrots..
Try giving a random human 30 chess moves and ask them to make a non-terrible legal move. Average humans even quite often try to make illegal moves when clearly seeing the board before them. There are even plenty of cases where people reported a bug because the chess application didn't let them do an illegal move they thought was legal.
And the sudden comparison to something that's safety critical is extremely dumb. Nobody said we should tie the LLM to a nuclear bomb that explodes if it makes a single mistake in chess.
The point is that it plays at a level far far above making random legal moves or even average humans. To say that that doesn't mean anything because it's not perfect is simply insane.
> And the sudden comparison to something that's safety critical is extremely dumb. Nobody said we should tie the LLM to a nuclear bomb that explodes if it makes a single mistake in chess.
But it actually is safety critical very quickly whenever you say something like “works fine most of the time, so our plan going forward is to dismiss any discussion of when it breaks and why”.
A bridge failure feels like the right order of magnitude for the error rate and effective misery that AI has already quietly caused with biased models where one in a million resumes or loan applications is thrown out. And a nuclear bomb would actually kill less people than a full on economic meltdown. But I’m sure no one is using LLMs in finance at all right?
It’s so arrogant and naive to ignore failure modes that we don’t even understand yet.. at least bridges and steel have specs. Software “engineering” was always a very suspect name for the discipline but whatever claim we had to it is worse than ever.
It's not a goalpost move. As I've already said, I have the exact same problem with this article as I had with the previous one. My goalposts haven't moved, and my standards haven't changed. Just provide the data! How hard can it be? Why leave it out in the first place?
That’s what your VC investment would be buying; the model of “pay experts to create a private training set for fine tuning” is an obvious new business model that is probably under-appreciated.
If that’s the biggest gap, then YC is correct that it’s a good area for a startup to tackle.
It would be hard to find any experts that could be paid "to create a private training set for fine tuning".
The reason is that those experts do not own the code that they have written.
The code is owned by big companies like NVIDIA, AMD, Intel, Samsung and so on.
It is unlikely that these companies would be willing to provide the code for training, except for some custom LLM to be used internally by them, in which case the amount of code that they could provide for training might not be very impressive.
Even a designer who works in those companies may have great difficulties to see significant quantities of archived Verilog/VHDL code, though it can be hoped that it still exists somewhere.
When I say “pay to create” I generally mean authoring new material, distilling your career’s expertise.
Not my field of expertise but there seem to be experts founding startups etc in the ASIC space, and Bitcoin miners were designed and built without any of the big companies participating. So I’m not following why we need Intel to be involved.
An obvious way to set up the flywheel here is to hire experts to do professional services or consulting on customer-submitted designs while you build up your corpus. While I said “fine-tuning”, there is probably a lot of agent scaffolding to be built too, which disproportionately helps bigger companies with more work throughput. (You can also acquire a company with the expertise and tooling, as Apple did with PA Semi in ~2008, though obviously $100m order of magnitude is out of reach for a startup. https://www.forbes.com/2008/04/23/apple-buys-pasemi-tech-ebi...)
I doubt any real expert would be tempted by an offer to author new material, because that cannot be done in a good way.
One could author some projects that can be implemented in FPGAs, but those do not provide good training material for generating code that could be used to implement a project in an ASIC, because the constraints of the design are very different.
Designing an ASIC is a year-long process and it is never completed before testing some prototypes, whose manufacture may cost millions. Authoring some Verilog or VHDL code for an imaginary product that cannot be tested on real hardware prototypes could result only in garbage training material, like the code of a program that has never been tested to see if it actually works as intended.
Learning to design an ASIC is not very difficult for a human, because a human does not need a huge number of examples, like ML/AI. Humans learn the rules and a few examples are enough for them. I have worked in a few companies at designing ASICs. While those companies had some internal training courses for their designers, those courses only taught their design methodologies, but with practically no code examples from older projects, so very unlikely to how a LLM would have to be trained.
I disagree with most of the reasoning here, and think this post misunderstands the opportunity and economic reasoning at play here.
> If Gary Tan and YC believe that LLMs will be able to design chips 100x better than humans currently can, they’re significantly underestimating the difficulty of chip design, and the expertise of chip designers.
This is very obviously not the intent of the passage the author quotes. They are clearly talking about the speedup that can be gained from ASICs for a specific workload, eg dedicated mining chips.
> High-level synthesis, or HLS, was born in 1998, when Forte Design Systems was founded
This sort of historical argument is akin to arguing “AI was bad in the 90s, look at Eliza”. So what? LLMs are orders of magnitude more capable now.
> Ultimately, while HLS makes designers more productive, it reduces the performance of the designs they make. And if you’re designing high-value chips in a crowded market, like AI accelerators, performance is one of the major metrics you’re expected to compete on.
This is the crux of the author's misunderstanding.
Here is the basic economics explanation: creating an ASIC for a specific use is normally cost-prohibitive because the cost of the inputs (chip design) is much higher than the outputs (performance gains) are worth.
If you can make ASIC design cheaper on the margin, and even if the designs are inferior to what an expert human could create, then you can unlock a lot of value. Think of all the places an ASIC could add value if the design was 10x or 100x cheaper, even if the perf gains were reduced from 100x to 10x.
The analogous argument is “LLMs make it easier for non-programmers to author web apps. The code quality is clearly worse than what a software engineer would produce but the benefits massively outweigh, as many domain experts can now author their own web apps where it wouldn’t be cost-effective to hire a software engineer.”
reply