I had a smart TV that gradually got slower and slower until it became basically useless. I figured it was just running out of RAM as apps got larger with updates over the years.
So, is this just an example of the first-mover disadvantage (or maybe the problem of producing public goods?). The first AI models were orders of magnitude more expensive to create, but now that they're here we can, with techniques like distillation, replicate them at a fraction of the cost. I am not really literate in the law but weren't patents invented to solve problems like this?
DeepSeek has demonstrated that there is no technical moat. Model training costs are plummeting, and the margins for APIs will just get slimmer. Plus model capabilities are plateauing. Once model improvement slows down enough, seems to me like the battle is to be fought in the application layer. Whoever can make the killer app will capture the market.
Model capabilities are not plateauing; in fact, they are improving exponentially. I believe people struggle to grasp how AI works and how it differs from other technologies we invented. Our brains tend to think linearly; that's why we see AI as an "app." With AI (ASI), everything accelerates. There will be no concept of an "app" in ASI world.
Hype. We were already plateauing with next token prediction and letting the models think out loud has simply pushed the frontier a little bit in purely test-taking domains.
Can you give examples? From gpt-4 came out in 2023 and since then nothing similar to (3.5 to 4 or 2 to 3) has come out. It has been 2 years now. All signs point towards OpenAI struggling to get improvements from its llms. The new releases have been minor since 2023
Money is not cheap anymore and OpenAI costs a lot to run. The longer it takes between impactful releases, the harder it gets for OpenAi to raise money especially in the face of significant competition both nationally, in the open source world and from China
They are absolutely not improving exponentially, by any metric. This is a completely absurd claim. I don’t think you understand what exponential means.
Something worth noting is that ChatGPT currently is the killer app -- DeepSeek's current chart-topping app notwithstanding (not clear if viral blip or long-term trend).
ChatGPT Plus gives me a limited number of o1 calls and o1 doesn't have web access, so I mostly have been using 4o in the last month and supplementing it with DeepSeek in the last week, for when I need advanced reasoning (with web search in DeepSeek as a bonus).
The killer app had better start giving better value, or I'd gladly pay the same amount of DeepSeek for unlimited access if they decided to charge.
For me ChatGPT was not that useful for work, the killer app was Cursor. It’ll be similar for other industries, it needs to be integrated directly in core business apps.
Can you unpack why you think there'll be defensible moats at the application layer?
(I thought you had this exactly right when I read it, but I kept noodling it while I brushed my teeth and now I'm not so sure llms won't just prove hard to build durable margins at meaningful volume on?)
I agree that the moats are weak for applications, but I think there are possible strategies to capture users. One way is to make it difficult to switch, similar to Apple Music vs Spotify or iPhone vs Android. Although these platforms offer very similar features, switching has high friction. As an example, I can imagine an AI app that has been adapted to your use cases, maybe it has a lot of useful information stored about you and your projects which makes it much more effective, and switching to some other app would mean you have to move all this data or possibly start over from zero.
> The vast majority of our work is already automated to the point where most non-manual workers are paid for the formulation of problems (with people), social alignment in their solutions, ownership of decision-making / risk, action under risk, and so on.
I agree. That's why I think the next step is automating trivial physical tasks, i.e. robotics, not automating nontrivial knowledge tasks.
This was my first thought too. AFAIK each layer encodes different information, and it's not clear that the last layer would be able to communicate well with the first layer without substantial retraining.
Like in a CNN for instance, if you fed later representations back in to the first kernels they wouldn't be able to find anything meaningful because it's not the image anymore, it's some latent representation of the image that the early kernels aren't trained on.
reply