Hacker News new | past | comments | ask | show | jobs | submit login

The thesis of that essay, is that

1. Nvidia GPUs are dominant at training

2. Inference is easier than training, so other cards will become competitive in inference performance.

3. As AI applications start to proliferate, inference costs will start to dominate training costs.

4. Hence Nvidia's dominance will not last.

I think the most problematic assumption is 3. Every AI company we see thus far is locked in an arms race to improve model performance. Getting overtaken by another company's model is very harmful for business performance (See Midjourney's reddit activity after DALLE-3), while a SOTA release instantly results in large revenue leaps.

We also haven't reached the stage where most large companies can fine-tune their own models, given the sheer complexity of engineering involved. But this will be solved with a better ecosystem, and will then trigger a boom in training demand that does scale with the number of users.

Will this hold in the future? Not indefinitely, but I don't see this ending in say 5 years. We are far from AGI, so scaling laws + market competition mean training runs will grow just as fast as inference costs.

Also, 4 is very questionable. Nvidia's cards are not inherently disadvantaged in inference, they may not be specialized ASICs, but are good enough for the job with an extremely mature ecosystem. The only reason why other cards can be competitive against Nvidia's cards in inference, is because of Nvidia's 70% margins.

Therefore, all Nvidia needs, to defend against attackers, is to lower their margins. They'll still be extremely profitable, their competitors not so much. This is already showing in the A100 H100 bifurcation. H100s are used for training, while the now old A100s used for inference. Inference card providers will need to compete against a permanently large stock of retired Nvidia training cards.

Apple is still utterly dominant in the phone business after nearly 2 decades. They capture the majority of the profits despite

1. Not manufacturing their own hardware

2. The majority of the market share by units sold is say Chinese/Korean

If inference is easy, while training is hard. It could just lead to Nvidia capturing all the prestigious and easy profits from training, while the inference market is a brutal low margin business with 10 competitors. This will lead to the Apple situation.




> See Midjourney's reddit activity after DALLE-3

What stats are you looking at? Looking at https://subredditstats.com/r/midjourney , I see a slower growth curve after the end of July, but still growing and seemingly unrelated to the DALL-E 3 release, which was more like end of October publicly.


> We are far from AGI

Do you mind expanding on this? What do you see as the biggest things that make that milestone > 5 years away?

Not trolling, just genuinely curious -- I'm a distributed systems engineer (read: dinosaur) who's been stuck in a dead-end job without much time to learn about all this new AI stuff. From a distance, it really looks like a time of rapid and compounding growth curves.

Relatedly, it also does look -- again, naively and from a distance -- like an "AI is going to eat the world" moment, in the sense that current AI systems seem good enough to apply to a whole host of use cases. It seems like there's money sitting around just waiting to be picked up in all different industries by startups who'll train domain-specific models using current AI technologies.


Intelligence lies on a spectrum. So does skill generality. Ergo, AGI is already here. Online discussions conflate AGI with ASI -- artificial superhuman intelligence, i.e. sci-fi agents capable of utopian/dystopian world domination. When misused this way, AGI becomes a crude binary which hasn't arrived. With this unearned latitude, people subsequently make meaningless predictions about when silicon deities will manifest. Six months, five years, two decades, etc.

In reality, your gut reaction is correct. We have turned general intelligence into a commodity. Any aspect of any situation, process, system, domain, etc. which was formerly starved of intelligence may now be supplied with it. The value unlock here is unspeakable. The possibilities are so vast that many of us fill with anxiety at the thought.


When discussing silicon deities, maybe we can skip the Old Testament's punishing, all-powerful deity and reach for Silicon Buddha.


Sounds like somebody has watched the movie "Her"


I also think the space of products that involve training per-customer models is quite large, much larger than might be naively assumed given what is currently out there.

It may be true that inference is 100x larger than training in terms of raw compute. But I think it very well could be that inference is only 10x larger, or same-sized.

And besides, you can look at it in terms of sheer inputs and outputs. The size of data yet to be trained on is absolutely enormous. Photos and video, multimedia. Absolutely enormous. Hell, we need giant stacks of H100's for text. Text! The most compact possible format!


I also think it's ludicrous to think that NVIDIA hasn't witnessed the rise of alternate architectures and isn't either actively developing them (for inference) or seriously deciding which of the many startups in the field to outright buy.


They already have inference specialized designs and architectures. For example, the entire Jetson line. Which is inference focused (you can train on them but like, why would you?). They have several DNLA accelerators on chip besides the GPU that are purely for inference tasks.

I think Nvidia will continue to be dominant because it's still a lot easier to go from Cuda training to Cuda (TensorRT acerbated lets say) inference than migrating your model to ONNX to get it to run on some weird inference stack.


>you can train on them but like, why would you?).

Because you want to learn a new gait on your robot in a few minutes.


Well sure if you model is small and light enough. But there's no training a 7B+ model on one (well, you could it would be so, so, so slow). Like, decades?


Unless there is s collapse in AI i would suspect inference will just keep exploding and as prices go down volume will go up. Margins will go down and maybe we will land back in prices similar to standard gpus. Still very expensive but not crazy.


Right, and they don't even need to lower their training margins, since training needs a fancy interconnect they can just ship the same chip at different prices based on interconnect (and are already doing so with the 40xx vs H100).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: