each card is not 20x more useful lol. there's no evidence yet that the deepseek architecture would even yield a substantially (20x) more performant model with more compute.
if there's evidence to the contrary I'd love to see. in any case I don't think a h800 is even 20x better than a h100 anyway, so the 20x increase has to be wrong.
We need GPUs for inference, not just training. The Jevons Paradox suggests that reducing the cost per token will increase the overall demand for inference.
Also, everything we know about LLMs points to an entirely predictable correlation between training compute and performance.
Jevons paradox doesn't really suggest anything by itself. Jevons paradox is something that occurs in some instances of increased efficiency, but not all. I suppose the important question here is "What is the price elasticity of demand of inference?"
Personally, in the six months prior to the release of the deepseekv3 api, I'd made probably 100-200 api calls per month to llm services. In the past week I made 2.8 million api calls to dsv3.
Processing each english (word, part-of-speech, sense) triple in various ways. Generating (very silly) example sentences for each triple in various styles. Generating 'difficulty' ratings for each triple. Two examples:
High difficulty:
id = 37810
word = dendroid
pos = noun
sense = (mathematics) A connected continuum that is arcwise connected and hereditarily unicoherent.
elo = 2408.61936886416
sentence2 = The dendroid, that arboreal structure of the Real, emerges not as a mere geometric curiosity but as the very topology of desire, its branches both infinite and indivisible, a map of the unconscious where every detour is already inscribed in the unicoherence of the subject's jouissance.
Low difficulty:
id = 11910
word = bed
pos = noun
sense = A flat, soft piece of furniture designed for resting or sleeping.
elo = 447.32459484266
sentence2 = The city outside my window never closed its eyes, but I did, sinking into the cold embrace of a bed that smelled faintly of whiskey and regret.
the jevons paradox isn't about any particular product or company's product, so is irrelevant here. the relevant resource here is compute, which is already a commodity. secondly, even if it were about GPUs in particular, there's no evidence that nvidia would be able to sustain such high margins if fewer were necessary for equivalent performance. things are currently supply constrained, which gives nvidia price optionality.
> there's no evidence yet that the deepseek architecture would even yield a substantially more performant model with more compute.
It's supposed to. There was an info that the longer length of 'thinking' makes o3 model better than o1. I.e. at least at inference compute power still matters.
> It's supposed to. There was an info that the longer length of 'thinking' makes o3 model better than o1. I.e. at least at inference compute power still matters.
compute matters, but performance doesn't scale with compute from what I've heard about o3 vs o1.
you shouldn't take my word for it - go on the leaderboards and look at the top models from now, and then the top models from 2023 and look at the compute involved for both. there's obviously a huge increase, but it isn't proportional
if there's evidence to the contrary I'd love to see. in any case I don't think a h800 is even 20x better than a h100 anyway, so the 20x increase has to be wrong.