The costs are a problem. We don't have hard evidence that this will be solved, b...

jcgrillo · 2025-02-09T13:23:36 1739107416

As far as I can tell, those charts are merely describing the price per token that LLM hosting companies are charging, not what running the model actually costs. The distinction is important for two reasons:

1. These companies are heavily subsidized by huge amounts of venture investment

2. If I'm integrating this technology into my web product there's absolutely no way I'll be adding a 3rd party company as a dependency. This is all way too new and bubbly to trust any of the current offerings will still exist in O(years).

Are there any similar studies showing not sticker price but actual compute/performance decrease?

tmnvdb · 2025-02-09T13:42:40 1739108560

This is incorrect, there are real effiency gains.

Slightly old by the standards of this field, but a good overview: https://arxiv.org/abs/2403.05812

jcgrillo · 2025-02-09T14:35:51 1739111751

I'm sorry, what is incorrect? And the paper you linked appears to be about training, not inference? I wouldn't train a model in a user's session, instead I'd run the model. That's the cost that seems a blocker, all the models are already trained, I dont need to invest a dime in that.

tmnvdb · 2025-02-09T15:01:18 1739113278

The real costs are dropping because of real efficiency gains and compute cost reductions.

The inference costs are dropping at a similar rate.

jcgrillo · 2025-02-09T15:12:45 1739113965

[citation needed]

tmnvdb · 2025-02-09T15:54:00 1739116440

https://a16z.com/llmflation-llm-inference-cost/

jcgrillo · 2025-02-09T16:10:22 1739117422

This is getting awfully tedious. Are you trolling or do you genuinely think responding to a question about actual compute cost trends with a venture fund's puff piece about sticker price trends is helpful?

tmnvdb · 2025-02-09T18:45:37 1739126737

You are correct to note 'real' costs of the leading labs are not public. It is surely true that the labs are operating at below cost (we are definitely not paying for the full R&D), but it seems unlikely that this fully explains the reduction in inference costs over the last years. We also know from open models like deepseek that the cost per inference token at a fixed performance level is going down very quickly matching the curve of leading labs inference cost decreases. You can even test it yourself on your own pc if you want.

I would add that inference cost decreases is what we should expect, it stands to reason there will be algorithmic improvements in inference, and compute cost is still going down just because of (a somewhat sloped) Moore's law.

Maybe you could also be a bit friendlier and forthcoming in your responses.

jcgrillo · 2025-02-09T19:40:59 1739130059

I apologize for my tone. It's just very frustrating to ask a question about applying this technology in the real world, to actual commercial products, only to get reply after reply of hopes and dreams. 20 years ago Ray Kurzweil promised me I'd have artificial hemoglobin that allows me to hold my breath underwater for an hour. Where is it? These are the arguments of grifters and conmen: "just wait look at the exponential growth!" No. I refuse. If this technique doesn't work right now then it simply doesn't work and we're all (except researchers and companies developing the core technology) wasting immense amounts of time and money thinking about it.