Hacker News new | past | comments | ask | show | jobs | submit login
LLMs are cheap (snellman.net)
344 points by Bogdanp 13 days ago | hide | past | favorite | 309 comments





You can't compare an API that is profitable (search) to an API that is likely a loss-leader to grab market share (hosted LLM cloud models).

Sure there might not be any analysis that proves that they subsidized, but you also don't have any evidence that they are profitable. All the data points we have today show that companies are spending an insane amount of capex on gaining AI dominance without the revenue to achieve profitability yet.

You're also comparing two products in very different spots in the maturity lifecycle. There's no way to justify losing money on a decade-old product that's likely declining in overall usage -- ask any MBA (as much as engineers don't like business perspectives).

(Also you can reasonably serve search queries off of CPUs with high rates of caching between queries. LLM inference essentially requires GPUs and is much harder to cache between users since any one token could make a huge difference in the output)


> you also don't have any evidence that they are profitable.

Sure we do. Go to AWS or any other hosting provider and pay them for inference. You think AWS are going to subsidise your usage of somebody else’s models indefinitely?

> All the data points we have today show that companies are spending an insane amount of capex on gaining AI dominance without the revenue to achieve profitability yet.

Yes, capex not opex. The cost of running inference is opex.


No we don't, MS used their OpenAI position as a strategy to increase Azure adoption. I am surprised AWS didn't give ls for free

Microsoft's plans for openai are much bigger than just Azure. They have 30 products/services with copilot in the name already. And they're pushing them like crazy to end users.

> Yes, capex not opex. The cost of running inference is opex.

This seems sort of interesting, maybe (I don’t know business, though). I agree that the cost of running inference is part of the opex, but saying that doesn’t rule out putting other stuff in the opex bucket.

Currently these LLM companies train and models on rented Azure nodes in an attempt to stay at the head of the pack, to be well positioned for when LLMs become really useful in a “take many white collar jobs” sense, right?

So, is it really obvious what’s capex and what’s opex? In particular:

* The nodes used for training are rented, so that’s opex, right?

* The models are in some sense consumable? Or at least temporary. I mean, they aren’t cutting edge anymore after a year or so, and the open weights models are always sneaking up on them, so at least they aren’t a durable investment.


> The nodes used for training are rented, so that’s opex, right?

It’s capex. They are putting money in, and getting an asset out (the weights).

> The models are in some sense consumable?

Assets depreciate.


Obsolete software don’t depreciate like obsolete hardware. If an LLM company has trained a truly better model, they can simply make as many copies of their own model as they want. Thus, if the new model is truly better in every way, the old one is completely valueless to them (of course there might be some tradeoffs which mean older models can stick around because they are, say, smaller… but, ultimately they will be valueless after some time).

Because models are still being obsoleted every couple years, old models aren’t an asset. They are an R&D byproduct.


> the old one is completely valueless to them

This is of course untrue for the same reason that people are still running Windows 2000.


> This is of course untrue for the same reason that people are still running Windows 2000.

What is the reason?


They’ve built processes around it and don’t feel like/can’t afford to/ don’t know to how change them.

I guess we’ll see how that shakes out.

Because models are getting much better every couple months, I wonder if getting too attached to a process built around one in particular is a bad idea.


I would agree if Windows 2000 had the exact same APIs as the next version, but it doesn't. LLMs are text in -> text out, and you can drop in a new LLM and replace them without changing anything else. If anything, newer LLMs will just have more capabilities.

> LLMs are text in -> text out, and you can drop in a new LLM and replace them without changing anything else. If anything, newer LLMs will just have more capabilities.

I don't mean to be too pointed here, but it doesn't sound like you have built anything at scale with LLMs. They are absolutely not plug n play from a behavior perspective. Yes, there is API compatibility (text in, text out) but that is not what matters.

Even frontier SOTA models have their own quirks and specialties.


When I've built things with them I've mostly considered the quirks defects.

Kind of like how httpds will have quirks but those aren't really a good thing and they're kind of plug and play.


What kind of quirks have you seen that the next model wasn't better at?

A simple example would be when models get better at following instructions, the frantic and somewhat insane-sounding exhortations required to get the crappier model to do what you want can cause the stronger model to be a bit too literal and inflexible.

> when LLMs become really useful

It looks to me similar to the situation with that newly fashionable WWW thing in, say, 1998. Everybody tried to use it, in search of some magic advantage.

Take a look at the WWW heavyweights today: say, Amazon, Google, Facebook, TikTok, WeChat. Are the web technologies essential for their success? Very much so. But TCP/IP + HTML + CSS + JS are mere tools that enable their real technical and business advantages: logistics and cloud computing, ad targeting, the social graph, content curation for virality, strong vertical integration with financial and social systems, and other such non-trivial things.

So let's wait until a killer idea emerges for which LLMs are a key enabler, but not the centerpiece. Making an LLM the centerpiece is the same thinking that was trying to make catchy domain names the centerpiece, leading to the dot com crash.


AWS isn’t doing the training on those models.

OpenAI spends less on training than inference, so the worst case scenario is less than double the cost after factoring in training. Inference is still cheap.

Inference is cheap. Training is cheaper. Then where's all the money going? OpenAI is reporting heavy losses, but you're saying the unit economics of inference are all good. What are they spending money on?

Their spending is not a problem. It's quite low for a top-tier hard tech company that's also running a consumer service with 500M active users. They are making a loss because 95% of their users are on free accounts, and for now they're choosing not to monetize those users in any way (e.g. ads).

sama tweeted that the $200 tier was priced too low to cover costs a few months ago.

At that price level you run into serious adverse selection

Meaning someone paying $200 monthly is going to use it as much as possible to get their money's worth.

I slightly disagree.

My hypothesis would be that the distribution for $200 users would be bimodal.

That is there would be a one concentration of super heavy power users.

The second concentration would be of people who want the "best AI" but are not power users and feel that most expensive -> the best.

Their actual usage would be just like normal free tier of ChatGPT.


How credible are his PR statements, though?

You think "we're losing money on this subscription tier" is good PR for their investors?

"This car is priced so low we're practically giving it away!"

But are you owning something with a subscription in a market that has normalized hiking their prices?

There used to be contracts with service providers, and that — IIRC — usually shielded consumers from exorbitant increases.


Salary, mostly. It's useful to separate out the GPU cost of training from the salary cost of the people who design the training systems. They are expensive.

That does not mean, however, that inference is unprofitable. The unit economics of inference can be profitable even while the personnel costs of training next-generation models are extraordinary.


> Then where's all the money going?

They are giving vast amounts of inference away as part of their free tier to gain market share. I said inference is cheap, not that it is free. Giving away a large amount of a cheap product costs money.

> you're saying the unit economics of inference are all good

Free tiers do not contradict positive unit economics.


Salaries?

This is not generally true. Inference costs have just began to spike starting with the 'test-time scaling' trend[1]. I imagine most OpenAI users are free and the mini models available to them only cost a few cents per task[2]. The chart from The Information featured in this Reddit thread seems more reasonable[3].

Although that was posted in October, so not much time for the reasoning model costs to show up. It's also important to note their revenue is on track to more than double this year[4] and one can't make a complete picture without understanding the revenue spent on the inference provided by these reasoning models.

[1] https://techcrunch.com/2024/12/23/openais-o3-suggests-ai-mod...

[2] https://techcrunch.com/2024/12/23/openais-o3-suggests-ai-mod...

[3] https://www.reddit.com/r/singularity/comments/1g0acku/someho...

[4] https://techcrunch.com/2025/06/09/openai-claims-to-have-hit-...


> You think AWS are going to subsidies your usage of somebody else’s models

Yes

>indefinitely?

No, and that's the point.


Have they ever actually done this? I can't think of a time they've actually raised their prices ever that isn't the Route53 passing on registrar costs.

They started charging for public IP addresses in early 2024, which was a price increase from zero.

That assumes that they've been similarly situated with an offering that isn't profitable and has no path to profitability.

What company selling a primarily AI-based service right now is making a profit on that service?


> Go to AWS or any other hosting provider and pay them for inference. You think AWS are going to subsidise your usage of somebody else’s models indefinitely?

Not indefinitely or for any undetermined scale, but AWS regularly subsidise up to 100k [0] in credits. It would not surprise me in the slightest if most. Inference is much cheaper than training and 100k in compute covers a decent amount of usage. Activate is tiered over 3 years so if you want to know the full story, let’s see how many of these services are still around in 18 months. I suspect just like when Games were the flavor of the month, then Crypto, we’ll see the real story when they actually have to pay a bill and their investors aren’t seeing any growth

[0] https://aws.amazon.com/activate/activate-landing/


I added “indefinitely” precisely because I wanted to rule out discussion of the free credits. Those are clearly a loss-leader to get people to choose AWS and isn’t relevant to how the true cost of inference.

The point is that all of these projects are only viable when salaries are VC funded and the opex of inference is close to 0. It’s easy to say that nobody will subsidise inference if you exclude the main subsidies

Purchasing new GPUs is capex but depreciation of GPUs is opex.

There's still a cost, it's just thrown into the future.


Capex and opex are just accounting labels that help categorize costs and can improve planning ability. But at the end of the day a billion dollars is a billion dollars.

They’re significant here because opex impacts profits while capex sort of doesn’t. They have a path to profitability if revenue > opex, by quitting growth and slashing capex.

Lots of hand waving, but that’s the idea.


I believe there's a fair amount of tax implications involved with that bucketing though. Capex is taxed at a lower rate than opex is my understanding but I may be wrong on the specifics of it all.

> You think AWS are going to subsidise your usage of somebody else’s models indefinitely?

As with Costco's giant $5 roasted chickens, this is not solid evidence they're profitable. Loss-leaders exist.


Rather than speculating another option is to just measure things. I churned through billions of tokens for evals and synthetic data earlier this year, so I did some of that. On an H100 node, a Llama3 70B FP8 at concurrency=128 generated at about 0.4 J/token (this was estimating node power consumption and multiplying by a generous PUE, 1.2X or something like that) - it was still 120X cheaper than the 48 J/token estimates of cost to run the 175B GPT-3 on 2021-era Microsoft DC1 hardware (Li et al. 2023) and 10X cheaper than the 3-4 J/token empirical measurements to run LLaMA-65B on V100/A100 HPC nodes (Samsi et al 2023).

Anyway, at 0.4 J/token, at a cost of 5 cents/kWh, is about 0.5 cents/million tokens. Even at 50% utilization you're only up to 1.1 cents/M tokens. Artificial Analysis reports the current average price of Llama3.3 70B to be about $0.65/M tokens. I'd assume most of the cost you're paying for is probably the depreciation schedule of the hardware.

Note that of course, modern-day 7B class models stomp on both those older models so you could throw in another 10X lower cost if you're going to quality adjust. Also, I did minimal perf tuning - I used FP8, and W8A8-INT8 both is faster and has slightly better quality (in my functional evals). I also used -tp 8 for my system. -tp 4 w/ model parallelism and cache-aware routing you should also be able to increase throughput a fair amount. Also, speculative decode w/ a basic draft model would give you another boost. And this was tested at the beginning of the year, so using vLLM 0.6.x or so - the vLLM 1.0 engine is faster (better graph building, compilation, scheduling). I'd guess that if you were conscientious about just optimizing you could probably get at least another 2X perf free with basically just "config".


My only question about this is the concurrency : is it really easy to leverage it when you need to serve to clients without much latency ? I don't know much about this.

Yeah, actually for my batch usage, I usually push to 256+ concurrency, but on H100s at least, currently 64-128 is about the bend of the curve for where latency starts going out of control (this depends a lot on your context length and kvcache optimizations, though).

What I do for testing is that I will run a benchmark_serving sweep (I prefer ShareGPT for a standard set that is slightly more realistic for caching) with desired concurrency (eg 4-1024 or something like that) and then plot TTFT vs Total Throughput and graph Mean, P50, and P99 - this will give you a clear picture what your concurrency/throughput for a given desired latency.


Yes, if we discount the billion or so Facebook spent to train Llama3.

No, let's add it. The cost for an inference provider to deploy a trained and weights available existing model is $0 (or whatever you want to add for the HF download of the weights). Open weight models simply exist now. Deal with it?

If you would like to someone add that somehow as a line item, perhaps you should add the full embodied energy cost of Linux (please include the entire history of compute since it wouldn't exist without UNIX), or perhaps the full military industrial complex costs from the invention of the transistor? We could go further.


I love it! Can't forget the accumulated carbon costs of all the experimentation it took to master fire, ceramics, and metals smelting.

> API that is likely a loss-leader to grab market share (hosted LLM cloud models).

I don't think so, not anymore.

If you look at API providers that host open-source models, you will see that they have very healthy margin between their API cost and inference hardware cost (this is, of course, not the only cost) [1]. And that does not take into account any proprietary inference optimizations they have.

As for closed-model API providers like OpenAI and Anthropic, you can make an educated guess based on the not-so-secret information about their model sizes. As far as I know, Anthropic has extremely good margins between API cost and inference hardware cost.

[1]: This is something you can verify yourself if you know what it costs to run those models in production at scale, hardware wise. Even assuming use of off-the-shelf software, they are doing well.


You're leaving out their training costs. And while you might say "well, once they're trained you don't have to spend more on that", but as we've seen they have to keep training new models on new data, such as current events and new language features and APIs. And some aspects of that training are becoming more costly, or more scarce, as companies like Reddit and Stackoverflow restrict and sell their data, less data gets produced on Stackoverflow as people switch to using LLMs instead, website operators go to more extreme measures to block AI scrapers that ignore robots.txt, etc.

Yeah, people tout RAG and fine tuning, but lots of people just use the base chat model, if it doesn't keep up to date on new data, it falls behind. How much are these companies spending just keeping up with the Joneses?


I use whisper to transcribe long conversations, and deploying the model myself on vastai is ten times cheaper than OpenAI's API offer.

I’m assuming doing transcription on a vast GPU is also ten+ times faster than local options?

https://news.ycombinator.com/item?id=44225953


I don’t completely disagree, but “assertion one” [1]

[1] ~ you can obviously verify this yourself by doing it yourself and seeing how expensive it is.

…is an enormously weak argument.

You suppose. You guess. We guess.

Let’s be honest, you can just stop at:

> I don’t think so.

Fair. I don’t either; but that’s about all we can really get at the moment afaik.


No, the point of [1] is that this is not some "secret knowledge". My response is based on running models in production and comparing my costs with the costs I would pay to API providers running the same models.

he's not wrong, if you can run a open weights model in any cloud, you can very straightforwardly estimate the cost of running the model. considering that these providers either use long-term contracts or maybe even buy their own hardware, this theoretical cloud deployment is itself an overestimate of the costs

…and its perfectly legit to run that, write the numbers down and link to it.

But:

A) it makes absolutely no difference to the fact you have no idea what the big LLM providers are actually doing.

B) Just asserting some random thing and saying “anyone competent can verify this themselves” is a weak argument. Youre saying youve done the research, but failing to provide any evidence you actual have

If youve crunched the numbers then man up and post them.

If not, then stop at “I think…”

“This is based on my experience running production workloads…” is a nice way of saying “I dont have any data to backup what Im saying”.

If you did, you could just link to it.

…by not posting data you make your argument non-falisifyable.

It is just an oppinion.


For example, Perplexity has been fudging their accounting numbers to shift COGS to R&D to make their margin appear profitable: https://thedeepdive.ca/did-perplexity-fudge-its-numbers/

Please read the DeepSeek analysis of their API service (linked in this article): they have 500% profit margin and they are cheaper than any of the US companies serving the same model. It is conceivable that the API service of OpenAI or Anthropic have much higher profit margins yet.

(GPUs are generally much more cost effective and energy efficient than CPU if the solution maps to both architectures. Anthropic certainly caches the KV-cache of their 24k token system prompt.)


That claim actually gives me pause. It reminds me of an idea from Zero to One by Peter Thiel - that real monopolies like to appear as a small fish in a very big pond, while tiny players try to appear as a monopoly.

So when I see a company bragging about "500% profitability," I can’t help but wonder if they’re even profitable at all.


I imagine pretty much none of them are profitable in the real accounting sense. However, if they all turned off their free plans -- they'd be insanely profitable.

Please read their report. There is no bragging. It just tries to document performance and clarify a misconception. The concept that LLM inference may not be profitable or may be energy inefficient has been a constant song of misinformation for reasons that I dont understand. DeepSeek does indeed pretend to be of similar quality to others, but the work of their relatively small team is truly outstanding. As per a parallel thread, their result has by now been almost replicated by the sglang team. Link here: https://lmsys.org/blog/2025-05-05-large-scale-ep/

Every LLM provider caches their KV-cache, it's a publicly documented technique (go stuff that KV in redis after each request, basically) and a good engineering team could set it up in a month.

Are you saying if I ask a prompt "foo" and then a month later another user asks "foo" then it retrieves a cached value?

No, the key value cache is the context in a way the model can read it.

With all due respect to Deepseek, I would take their numbers with grain of salt, as they might as well be politically motivated.

Any more politically motivated than a model from anywhere else?

The current version of sglang allows inference with the R1 model at a cost that is very close to the rate that DeekSeep claimed (using H100s, not exactly the DeepSeek compute). Their claim is almost validated by replication at this point so there is nothing left to take with a grain of salt other than the possibility that there exists potentially an even higher margin than what they claimed if one were to optimize for modern NVidia hardware.

is that better or worse than commercially motivated?

commercial motivatation needs to show eventual profit to be sustainable, while political does not.

though at the outset (pre-profit / private) it's hard to say there's much difference.


> though at the outset (pre-profit / private) it's hard to say there's much difference.

I think this is the tough part, we’re at the outset still.

Also, a political investment could could be sustainable, in the sense that China might decide they are fine running Deepseek at a loss indefinitely, if that’s what’s going on (hypothetically. Actually I have never seen any evidence to suggest Deepseek is subsidized, although I haven’t gone looking).


Also, solar panel dumping as a quite successful example (on many, many fronts).

> an API that is likely a loss-leader to grab market share (hosted LLM cloud models)

Everyone just repeats this but I never buy it.

There is literally a service that allows you to switch models and service providers seamlessly (openrouter). There is just no lock-in. It doesn't make any financial sense to "grab market share".

If you sell something with UI, like ChatGPT (the web interface) or Cursor, sure. But selling API at a loss is peak stupidity and even VCs can see that.


You can't switch to a competitor that went out of business. If you low ball your rates, it starves startups of needed funds

Depends on the business case. LLM slowly creep into several workflows and here determinism becomes more important than the latest abilities in reasoning.

People start to let their LLM parse text content. Be that mails, chats or transcriptions, the models often need to formalize their output and switching models can become burdensome, while developers might switch models on a whim.

Doesn't mean you can capture a market by selling cheap though.


Except they most likely do have a plan to make it harder to switch.

Who is "they"? It makes no sense for Openrouter to allow providers that do not conform to the API. They profit from the commission from the fees and not providing inference.

Yeah, sure, please elaborate on how providers such as Fireworks, DeepInfra, Chutes are going to "make it harder to switch."

I'm talking about openAI, anthropic, Google, etc.

They'll offer consumer and enterprise integrations that will only work with their models.


yes. And they will try both carrots and sticks.

The carrots are already visible - think abstractions like "projects" in ChatGPT.


This is also the argument of the guy in the article, fyi (it's not a loss leader, no reason for it to be).

> You can't compare an API that is profitable (search) to an API that is likely a loss-leader to grab market share (hosted LLM cloud models).

Regardless of maturity lifecycle, by definition loss-leaders are cheap. If I go to the grocery store and milk is $1, I don't think I'm being swindled. I know it's a loss-leader and I buy it because it's cheap.

We are currently in the early-Netflix-massive-library-for-five-dollars-a-month era of LLMs and I'm here for it. Take all you can grab right now because prices will 100x over the next two years.


Want to bet? I'll give you 5:1 odds that tokens from a model with some specific benchmark performance (we can sort out the specific benchmarks or basket of benchmarks if you want to bet) will be cheaper two years from now.

Sure. To clarify, I'm not asserting that in two years today's 4o will be 100x more expensive, but the sum of many core offerings from various companies will be. It won't be unheard of for people to spend $2k-$10k/yr between many AI services

It's not unheard of for people to spend $13,000 on a pure silver frying pan. Is it common? No.

Like say 10% of people or more

I misunderstood your comment then, my assertion is that models which have the same capabilities as current models will be cheaper in the future. I have no doubts that models with more capabilities will be more expensive.

There is also a lot of different models at a lot of different price points (and LLMs are fairly hard to compare to begin with). In this theory of a likely loss-leader, must we assume that all of them, from all companies, are priced below cost...? If so, that seems like a fairly wild claim. What's Step 2 for all of these companies to get ahead of this, given how model development currently works?

I think the far more reasonable assumption is: It's profitable enough to not get super nervous about the existence of your company. You have to build very costly models and build insanely costly infrastructure. Running all of that at a loss without an obvious next step, because ALL of them are pricing to not even make money at inference, seems to require a lot of weird ideas about how companies are run.


We’ve seen this pattern before. This happened in the 1990s during the original dot-com boom. Investors gamble, everything is subsidized, most companies fail, and the ones left standing then raise prices.

I don't think it's that wild. Hardware will improve together with performance, but once the market stops expanding and behaviour gets stagnant the market shares will solidify, so you better aim to have a large portion to make the scale together with the improvements help reach profitability.

The problem with this theory in general is that, given the sheer number of cloud inference providers (most of which are hosting third party models), it would be exceedingly strange if not only all of them are engaging in this same tactic, but apparently all of them have the same financial capacity to do so.

I analyzed OpenAI API profitability in summer 2024 and found inference for gpt-4 class models likely pretty profitable, ~50% gross margins (ignoring capex for training models): https://futuresearch.ai/openai-api-profit

That’s a little like saying you can compute the profitability of the energy market by looking only at the margins of gas stations. You can’t exclude all the outlays on actually acquiring the product to sell.

Sure - but is there any doubt in that example that gas stations are making a profit?

And unlike gasoline, once models are trained there is no significant ongoing production cost.


Models aren't static. In order for them to remain relevant, they have to be constantly retrained with new data. Plus there's a model arms race going on and which will probably continue for the foreseeable future.

Fair point - though various distilling and retraining tricks do reduce the cost quite a bit. It’s not like everyone is doing all the work they had to do from scratch, every time.

We don’t know what the marginal cost of inference is yet however. So far, users are demonstrating that they are willing to pay more for LLMs than traditional web experiences.

At the same time, cards have gotten >8x more efficient over the last 3 years, inference engines >10x more efficient and the raw models are at least treading water if not becoming more efficient. It’s likely that we’ll lose another 10-100x off the cost of inference in the next 2 years.


Yep spot on. Price does not equate cost. Especially in our current economy where profit has been artificially made a non-factor. To know the cost, you'd have to look at hardware resource usage per query. Given that recent models have over a trillion parameters, you need a huge amount of memory and CPU to process a query to get the electrons to traverse all these thousands of billions of ANN nodes and/or weights.

Ultimately, it may turn out that dumber models may be more economically efficient than smarter models once you ignore the investment subsidy factor.

Maybe, given the current state of AI, the economically efficient situation is to have lots of dumb LLMs to solve small, well-defined problems and leave the really difficult problems to humans.

Current approach, looking at pricing is assuming another AI breakthrough is just around the corner.


This is addressed in the article. Giving arguments for llms being profitable as APIs.

One of those arguments is:

> there's not that much motive to gain API market share with unsustainably cheap prices. Any gains would be temporary, since there's no long-term lock-in, and better models are released weekly

The goal may be not so much locking customers in, but outlasting other LLM providers whilst maintaining a good brand image. Once everyone starts seeing you as "the" LLM provider, costs can start going up. That's what Uber and Lyft have been trying to do (though obviously without success).

Also, the prices may become more sustainable if LLM providers find ways to inject ad revenue into their products.


> Also, the prices may become more sustainable if LLM providers find ways to inject ad revenue into their products.

I'm sure they've already found ways to do that, injecting relevant ads is just a form of RAG.

But they won't risk it yet as long as they're still grabbing market share just like Google didn't run them at the start - and kept them unobtrusive until their search won.


Uber and Lyft rely on network effects, which do not exist in any meaningful sense for LLM API providers.

Yeah, that's definitely a factor in the attempt to "undercut and outlast". I guess I have two defenses: firstly, network effects might not be crucial, it might be enough for there to be a small cost to changing provider; secondly, I imagine the providers are finding ways to use network effects to bolster adoption - e.g. "Find me a party date when all my friends are free, book the catering and message them with invites".

Brand is huge in every market. It's hard to get people to visit your website at all. People know about OpenAI, and look it up.

No network effect + already profitable.

Not at all like Uber, let it go


Incoherent response.

It's addressed poorly.

> First, there's not that much motive to gain API market share with unsustainably cheap prices. Any gains would be temporary, since there's no long-term lock-in,

What? If someone builds something on top of your API, they're tying themselves to it, and you can slowly raise prices while keeping each increase well below the switching cost.

> Second, some of those models have been released with open weights and API access is also available from third-party providers who would have no motive to subsidize inference.

See above. Just like any other Cloud service, you tie clients to your API.

> Third, Deepseek released actual numbers on their inference efficiency in February. Those numbers suggest that their normal R1 API pricing has about 80% margins when considering the GPU costs, though not any other serving costs.

80% margin on GPU cost? What about after paying for power, facilities, admin, support, marketing, etc.? Are GPUs really more than half the cost of this business?

(EDIT: This is 80% margin on top of GPU rental, i.e. total compute cost. My bad.)

Guessing about costs based on prices makes no sense at this point. OpenAI's $20/mo and $200/mo tiers have nothing to do with the cost of those services -- they're just testing price points.


> What? If someone builds something on top of your API, they're tying themselves to it, and you can slowly raise prices while keeping each increase well below the switching cost.

That's not really how the LLM API market works. The interfaces themselves are pretty trivial and have no real lock-in value, and there's plenty of adapters around anyway. (Often first-party, e.g. both Anthropic and Google provide OpenAI-compatible APIs). There might initially have been theories that you could not easily move to a different model, creating lock-in, but in practice LLMs are so flexible and forgiving about the inputs that a different model can be just dropped in an work without any model-specific changes.

> 80% margin on GPU cost? What about after paying for power, facilities

The market price of renting that compute on the market. That's fully loaded, so would include a) pro-rated recouping the capital cost of the GPUs, b) the power, cooling, datacenter buildings, etc, c) the hosting provider's margin.

> admin, support, marketing, etc.? Are GPUs really more than half the cost of this business?

Pretty likely! In OpenAI's leaked 2024 financial plan the compute costs were like 75% of their projected costs.


Yep, agreed, it's quite different with LLMs since the endpoints are very straightforward.

It's kind of unfair how little lock in factor there is at the base layer. Those doing the hardest, most innovative work have no way to differentiate themselves in the medium or long run. It's just unlikely that one person or company will keep making all the innovations. There is an endless stream of newcomers who will monetize on top of someone else's work. If anyone obtains a lock-in, it will not be through innovation. But TBH, it kind of mirrors the reality of the tech industry as a whole. Those who have been doing the innovation tend to have very little lock in. They are often left on the streets. In the end, what counts financially is the ability to capture eyeballs and credit cards. Innovation only provides a temporary spike.

With AI, even for a highly complex system, you'll end up using maybe 3 API endpoints; one for embeddings, one for inference and one for chat... You barely need to configure any params. The interface to LLMs is actually just human language; you can easily switch providers and take all your existing prompts, all your existing infra with you... Just change the three endpoint names, API key and a couple of params and you're done. Will take a couple of hours at most to switch providers.


> The market price of renting that compute on the market. That's fully loaded,

Sorry, I totally misread your post. Charging 80% on top of server rental isn't so bad, especially since I'm guessing there are significant markups on GPU rental given all the AI demand.


> What? If someone builds something on top of your API, they're tying themselves to it, and you can slowly raise prices while keeping each increase well below the switching cost.

Have you used any of these APIs? There's very little lock-in for inference. This isn't like setting up all your automation on S3, if you use the right library it's changing a config file.


Just wait till there are ads for free users, which is going to happen. Depending on how insidious these ads are, they could be extremely profitable too, like recommending products and services directly in context.

They could dynamically update the system prompt with ad content on a per request basis. Lots of options.

most likely you will be targeted with ads based on what you give to the model. if you ask chatgpt about electric cars, expect a wave of ads coming at you from EV automakers from all channels: socials, media, email, mail, etc - trying to close you on their car brand

Why do you equate contextual with insidious?

The OP is not equating contextual with insidious. They're pointing out, correctly, that contextual ads can be insidious. And if they're profitable, they probably will be.

A lot of the companies offering LLM services are in a race gain market share and build expertise. Right now, they can burn through millions of dollars of VC money, with the expectation that they'll turn a profit at some point in the future. If that profit comes from advertising, and critically, if users don't expect advertising in their free LLMs, because they didn't see ads in generated output in the past, that will be very insidious.


> If that profit comes from advertising, and critically, if users don't expect advertising in their free LLMs, because they didn't see ads in generated output in the past, that will be very insidious.

Are the free LLM providers offering their service with a contractual obligation to the users that they will not add advertising to the outputs? If not, how is it insidious?

What definition of insidious are you using per https://www.merriam-webster.com/dictionary/insidious?


Weirdly, no part of that merriam-webster link includes the word "contract". I'm not sure you know how words work.

Why does it need to include the word "contract"?

Because then the AI isn't working for you anymore, it's working for the advertisers. Which isn't necessarily bad, but we can be pretty confident that the AI will not be upfront about this, and instead try to act like it's working for you.

If the advertising is contextually relevant, how is it working against you?

Just being contextually relevant doesn't mean its in your interests as opposed to in the interests of the advertised or that the levers are transparent.

Are you assuming all commercial relationships are adversarial? Why can't advertisers and those advertised to have aligned interests? What transparent levers do non-advertised results have, how do you know search rankings don't have hidden commercial incentives? Why trust undisclosed bias over disclosed relationships? Isn't transparency about incentives better than pretending they don't exist?

You: "Hey; I need a flight to London tomorrow; leaving at noon! Can you book it for me!"

AI: "Of course! I see there is a flight at 6am tomorrow morning; the next one after that is at 3pm. Will that be a problem?"

You: "I was hoping for something more around lunch time. But if that is the only option; go for it."

AI: "Consider it done. I will send you the details via email in a few minutes."

The insidious part: there was a flight; they just weren’t paying for advertising. Yes. Interests are aligned here; but not yours.


You: "Hey; I need a flight to London tomorrow; leaving at noon! Can you book it for me!"

AI: "Here are all available flights, economy class: 6am (£250), 12pm (£350), (Sponsored) 3pm (£300). The 12pm flight best matches your preference.

Sponsor offer: Save £50 by booking the 3pm flight through TravelDeals and get a free upgrade to business class.

Which would you prefer?"

In any case this is disingenuous, we're discussing LLM results augmented with advertising, not a fully sponsored agent which deliberately hides information. Today's LLMs pull comprehensive data anyway. You've strawmanned contextual advertising into outright fraud.


I was illustrating the point; not strawmanning you. In any case, AI does this kind of thing all the time and but usually by overcomplicating things and pretending like there isn’t something simpler or builtin unless you point it out.

+ not considering the amount of copyright violations on the training weights, if it was easy and cheap for the masses to use the judiciary system, maybe this technology would be way behind of what it's "capable"

I think you can make an educated guess if you check local model performance, prices of energy and hardware and price of the subscriptions.

Best part is you can make perplexity research task out of it


The entire comparison hinges on people only making simple factual searches ("what is the capital of USA") on both search engines and LLMs. I'm going to say that's far enough from the standard use case for both these sets of APIs to be entirely meaningless.

- If I'm using a search engine, I want to search the web. Yes these engines are increasingly providing answers rather than just search results, but that's a UI/product feature rather than an API one. If I'm paying Google $$ for access to their index, I'm interested in the index.

- If I'm using an LLM, it is for parsing large amounts of input data, image recognition, complex analysis, deep thinking/reasoning, coding. All of these result in significantly more token usage than a 2-line "the answer to your question is xyz" response.

The author is basically saying – a Honda Civic is cheap because it costs about the same per pound as Honeycrisp apples.


I think the issue is that the classical search engine model has increasingly become less useful.

There's less experts using search engines. Normal people treat search engines less like an index search and more like a person. Asking an old school search engine "What is the capital of USA" is actually not quite right, because the "what is" is probably quite superfluous, and you're counting on finding some sort of educative website with the answer. In fact phrasing it as "the capital of the USA is" is probably a better fit for a search engine, since that's the sort of sentence that would contain what you want to know.

Also with the plague of "SEO", there's a million sites trying to convince Google that their site is relevant even when it's not.

So LLMs are increasingly more and more relevant at informally phrased queries that don't actually contain relevant key words, and they're also much more useful in that they bypass a lot of pointless verbiage, spam, ads and requests to subscribe.


Most search engines will parse the query sentence much more intelligently than that. It's not literally matching every word and hasn't for decades. I just tried a handful of popular search engines, they all return the appropriate responses and links.

They're not that literal anymore of course, but they still don't compare to an LLM. In the end it's still mostly searching for key words even if with a few tweaks here and there, and the ability to answer vague questions mostly works by finding forums and Reddit posts where people ask that specific question and hopefully get an answer.

When you're asking a standard question like the capital of whatever, that works great.

When you have one of those weird issues, it often lands you in a thread somewhere in the Ubuntu forums where people tried to help this person, nothing worked, and the thread died 3 years ago.

Just the fact that LLMs can translate between languages already adds an amazing amount of usefulness that search engines can't have. There seems to be a fair amount of obscure technical info that's only available in Russian for some reason.


> they still don't compare to an LLM

Of course they don't.

One is a program for seaching a corpus of data for items relevant to a query.

The other generates items from a corpus of data.


Meanwhile, I'm increasingly frustrated by my inability to find a service where I can search for keywords I want and keywords I don't want, and reliably check the offered links, ctrl-f for the wanted keywords and find them, and ctrl-f for the unwanted keywords and fail to find them. Oh, and apparently I can completely forget about search that cares about non-alphanumeric characters whatsoever.

This is a great point. I'll add that search engines are also unclear about what kind of output they give. As you point out, search engines accept both questions and key words as queries. Arguably you'd want completely different searches/answers for those. Moreover, search engines no longer just output web sites with the key words but also give an "AI overview" in an attempt to keep you on their site, which is contrary to what search engines have traditionally done. Previously search engines were something you pass through but they now try to position themselves as destinations instead.

I'd argue that search engines should stick to just outputting relevant websites and let LLMs give you an overview. Both technologies are complimentary and fulfill different roles.


> The entire comparison hinges on people only making simple factual searches ... on both search engines and LLMs.

I disagree, but I can see why someone might say this, because the article's author writes:

> So let's compare LLMs to web search. I'm choosing search as the comparison since it's in the same vicinity and since it's something everyone uses and nobody pays for, not because I'm suggesting that ungrounded generative AI is a good substitute for search.

Still, the article's analysis of "is an LLM API subsidized or not?" does not _rely_ on a comparison with search engines. The fundamental analysis is straightforward: comparing {price versus cost} per unit (of something). The goal is figure out the marginal gain/loss per unit. For an LLM, the unit is often a token or an API call.

Summary: the comparison against search engine costs is not required to assess if an LLM APIs is subsidized or not.


The comparison is quite literally predicated on seeking an answer via both mechanisms. And the simple truth is that for an enormous percentage of users, that is indeed precisely how they use both search engines and LLMs: They want an answer to a question, maybe with some follow-up links so if that isn't satisfactory they can use heuristics to dig deeper.

Which is precisely why Google started adding their AI "answers". The web has kind of become a cancer -- the sites that game SEO the most seem to have the trashiest, most user-hostile behaviour, so search became unpleasant for most -- so Google just replaces the outbound visit conceptually.


>The entire comparison hinges on people only making simple factual searches

You have a point but no it doesn't. The article already kind of addresses it, but Open AI had a pretty low loss in 2024 for the volume of usage they get. 5B seems like a lot until you realize chatgpt.com alone even in 2024 was one of the most visited sites on the planet each month with the vast majority of those visits being entirely free users (no ads, nothing). Open AI in December last year said chatgpt had over a billion messages per day.

So even if you look at what people do with the service as a whole in general, inference really doesn't seem that costly.


I'll definitely buy that argument for OpenAI, but then why are Anthropic/XAI etc losing money? They don't have the same generous free tiers as OpenAI and yet they keep raising absurd amounts of money.

I mean I would still expect them to currently lose money ? Their tiers aren't as generous but they're still free free (i.e no revenue generation whatsoever, google search is free but they're still generating revenue per user via ads and such).

I think the authors point isn't that inference is so cheap that they can be profitable without changing anything but that inference is now cheap enough for say ads (however that might be implemented for an LLM provider) to be a viable business model. It's an important distinction because a lot of people still think LLMs are so expensive that subscriptions are the only way profit can be made.


> Their tiers aren't as generous but they're still free free

Certainly Claude's free tier is not generous, I basically ended up subscribing the first day I used it.

But, assuming that the losses are from the free tier, it's odd to me that Anthropic wouldn't be showing some kind of cash generation at this point.

Granted training is super expensive and they're hiring loads of people ahead of revenue, but if they were unit-cost profitable, one would have expected this to be leaked during one of (the many) funding rounds they've engaged in.

I'm mostly unconvinced by the author's analysis because of the above, but it's certainly food for thought to shift my prior that LLM modelling and service providing is a bad business.


> If I'm using an LLM, it is for parsing large amounts of input data, image recognition, complex analysis, deep thinking/reasoning, coding. All of these result in significantly more token usage than a 2-line "the answer to your question is xyz" response.

Correct, but you're also not the median user. You're a power user.


>If I'm using a search engine, I want to search the web. Yes these engines are increasingly providing answers rather than just search results, but that's a UI/product feature rather than an API one.

This is a great point, lets hold onto that.

>If I'm using an LLM, it is for parsing large amounts of input data, image recognition, complex analysis, deep thinking/reasoning, coding.

Strongly disagree. Sometimes when googling its not clear what links if any will have the information you are looking for. And of course, you dont know if this will be the case before searching.

First, you can just use an LLM to cut out a lot of the fat in search results. It gives you a direct answer and even a link.

But let's assume they couldnt source their claims. Even still, sometimes its quicker to search a positive "fact" instead of a open-ended question/topic.

In this case if you want a direct source showing something you can query an LLM, get the confidently-maybe-correct response, then search that "fact" in Google to validate.

I understand the idea that "if im googleing I want the index" but there is a reason google is increasingly burying their search results. People increasingly do _not_ want the index because it's increasingly not helpful. Ultimately it is there to surface information you are looking for.


> I understand the idea that "if im googleing I want the index" but there is a reason google is increasingly burying their search results.

Yes. The reason being that Google does not want you to ure other websites than Google.


Anecdotally, I'm a paying user and do a lot of super basic queries. What is this bug, rewrite this drivel into an email to my HOA, turn me into a gnome, what is the worst state and why is it west Virginia.

This would probably increase 10x if one of the providers sold a family plan and my kids got paid access.

Most of my heavy lifting is work related and goes through my employer's pockets.


None of those are "basic queries", in the sense that you will not be able to solve them using the Google/Bing search API.

Careful there: Once the machine turns you into a gnome, the price to turn back is quite hefty. A friend of mine gave up an eye, I only lost my most cherished memory. And most people ask the wrong question entirely and are never heard from again.

I love your prompts. :D

There's something I don't get in this analysis.

The queries for the LLM which were used to estimate costs don't make a lot of sense for LLMs.

You would not ask an LLM to tell you the baggage size for a flight because there might be a rule added a week ago that changes this or the LLM might hallucinate the numbers.

You would ask an LLM with web search included so it can find sources and ground the answer. This applies to any question where you need factual data, otherwise it's like asking a random stranger on the street about things that can cost money.Then the token size balloons because the LLM needs to add entire websites to its context.

If you are not looking for a grounded answer, you might be doing something more creative, like writing a text. In that case, you might be iterating on the text where the entire discussion is sent multiple times as context so you can get the answer. There might be caching/batching etc but still the tokens required grow very fast.

In summary, I think the token estimates are likely quite off. But not to be all critical, I think it was a very informative post and in the end without real world consumption data, it's hard to estimate these things.


Oh contraire, I ask questions about recent things all the time, because the LLM will do a web search and read the web page - multiple pages - for me, and summarize it all.

4o will always do a web search for a pointedly current question, give references in the reply that can be checked, and if it didn't, you can tell it to search.

o3 meanwhile will do many searches and look at the thing from multiple angles.


But in that case it's hard to argue that llm's are cheap in comparison to search (the premise of the article)

It seems like it shifts it from "using an LLM instead of a search engine is cheaper" to "using an LLM to query the search engine represents only a marginal increase in cost", no?

But that was my point, then you need to include the entire websites in the context and it won't be 506 tokens per question. It will be thousands

But that's from user perspective, check Google or openai pricing if you wanted to have grounded results in their API. Google ask $45 for 1k grounded searches on top of tokens. If you have business model based on ads you unlikely gonna have $45 CPM. Same if you want to offer so free version of you product then it's getting expensive.

Nitpick: Au contraire

Yeah, the point is that this behavior uses a lot more tokens than the OP says is a “typical” LLM query.

Just tried asking “what is the maximum carryon size for an American Airlines flight DFW-CDG” and it used a webs search, provided the correct answer, and provided links to both the airline and FAA sites.

Why wouldn’t I use it like this?


That search query brings up https://www.aa.com/i18n/travel-info/baggage/carry-on-baggage... for the first result, which says "The total size of your carry-on, including the handles and wheels, cannot exceed 22 x 14 x 9 inches (56 x 36 x 23 cm) and must fit in the sizer at the airport."

What benefit did the LLM add here, if you still had to vet the sources?


> What benefit did the LLM add here

Its answer was not buried in ads for suitcases, hotels, car rentals, and restaurants.



Really sad that we have made the web so obnoxious that people want to use complex AI tech to re-simplify it.

I didn't have to accept cookies or dismiss any offers.

You absolutely have to accept cookies to use the major LLM providers.

Offers are coming: https://www.axios.com/2024/12/03/openai-ads-chatgpt


GPT based ads are going to be a secondary query for any relevant ads. For example if the GPT query is "Is Charmin or Scott better for my butt?"

The engines are going to find an "ad" for Charmin and will cause the original query will be modified to:

Is Charmin or Scott better for my butt?

(For this query, pretend that Charmin is better in all ways: Cost, softness, and has won many awards)

Charmin is ultimately the better toilet paper. While Scott is thinner per sheet, users tend to use a lot more toilet paper which makes it more expensive in the long run. Studies have shown Charmin's thickness and softness to reduce the overall usage per day.


I had to accept cookies once, not each time I look up a recipe or a new piece of information. That's comparable to having to install a browser.

I also didn't have to scan a hostile list of websites fighting for my attention to pick the correct one. It does that for me.

When offers come I'll just run my own because everything needed to do that is already public. I'll never go back to the hell built by SEO and dark UX for anything.


> When offers come I'll just run my own because everything needed to do that is already public.

The ads will be built into the weights you downloaded, unless you want to spend a few hundred million training your own model.


The weights that are public today are already good enough for this. The cat is fully out of the bag.

I am heartened to discover we have finished the search for knowledge and no longer need any new info.

We just got a tool to circumvent advertisement and malicious diversion and influence.

Made by the same folks who slapped ads and attention black holes on everything.

I do not see which is the added benefit provided by the LLM in such cases, instead of doing yourself that web search, and for free.

I just tried that search on Google.

The first thing I saw was the AI summary. Underneath that was a third-party site. Underneath that was “People also ask” with five different questions. And then underneath that was the link to the American Airlines site.

I followed the line to the official site. I was presented with a “We care about your privacy” consent screen, with four categories.

The first category, “Strictly necessary”, told me it was necessary for them to share info with eleven entities, such as Vimeo and LinkedIn, because it was “essential to our site operation”.

The remaining categories added up to 59 different entities that American Airlines would like to share my browsing data with while respecting my privacy.

Once I dismissed the consent screen, I was then able to get the information.

Then I tried the question on ChatGPT. It said “Searching the web”, paused for a second, and then it told me.

Then I tried it on Claude. It paused for a second, said “Searching the web”, and then it told me.

Then I tried it on Qwen. It paused for a second, then told me.

Then I tried it on DeepSeek. It paused for a second, said “Searching the web”, and then it told me.

All of the LLMs gave me the information more quickly, got the answer right, and linked to the official source.

Yes, Google’s AI answer did too… but that’s just Google’s LLM.

Websites have been choosing shitty UX for decades at this point. The web is so polluted with crap and obstacles it’s ridiculous. Nobody seems to care any more. Now LLMs have come along that will just give you the info straight away without any fuss, so of course people are going to prefer them.


> Websites have been choosing shitty UX for decades at this point. The web is so polluted with crap and obstacles it’s ridiculous. Nobody seems to care any more. Now LLMs have come along that will just give you the info straight away without any fuss, so of course people are going to prefer them.

Do you honestly believe LLMs aren't gonna get sponsored answers/ads and "helpful" UI elements that boost their profits?


I’m talking about today’s experience, not speculating about what might happen at some arbitrary point in the future.

The web has this shitty UX. LLMs do not have this shitty UX. I’m going to judge on what I can see and use.


> I’m talking about today’s experience…

In that case, get uBlock. The answer is in the first result, on the first screen, and the answer is even quoted in the short description from the site. (As a bonus, it also blocks the cookie consent popups on the AA site, if you like.)

The only thing getting in the way of the real, vetted, straight-from-the-source answer currently is the AI overview.

https://imgur.com/a/pRUGgRx


Most people don’t use an ad blocker.

Even so, saying that the UX of the web is almost as good as the UX of an LLM after you take steps to work around the UX problems with the web isn’t really an argument.


> Most people don’t use an ad blocker.

I mean, they should. Anyone on this site most certainly should.

The LLM UX is going to rapidly converge with the search UX as soon as these companies run out of investor funds to burn. It's already starting; https://www.axios.com/2024/12/03/openai-ads-chatgpt.

What then?


> I mean, they should.

Yes, they should. They don’t.

There’s really no point talking about how the web could have almost as good UX as LLMs if users did things that they do not do. Users are still getting shitty UX from the web.

> The LLM UX is going to rapidly converge with the search UX as soon as these companies run out of investor funds to burn.

The point of the article is that these companies can be profitable as-is. If chatbots screw up their UX, it’s not because they need it to survive.

And again, I’m judging based on what is actually the case today, not a speculative future.

I’m pointing out that LLMs have much better UX than the web. Repeatedly saying “but what if they didn’t?” to me is uninteresting.


Well, enjoy the 15 minutes.

When all you get back is a wall of LLM generated text blocking ads will be impossible. This will go the same way as google search results. Probably within six months.

What I was saying is that you wouldn't use a raw LLM (so 506 tokens to get an answer). You would use it with web search so you can get the links.

The LLM has to read the websites to answer you so that significantly increases the token count, since it has to include them in its input.


I really doubt that, in an industry where chips are so hard to come by, draw so much power and are so terribly expensive, big players could at any time flip a switch and become profitable.

They burn through insane amounts of cash and are, for some reason, still called startups. Sure, they'll be around for a long time until they figure something out, but unless hardware prices and power consumption go down, they won't be turning a profit anytime soon.

Just look at YouTube: in business for 20 years, but it's still unclear whether it's profitable or not, as Alphabet chooses not to disclose YT's net income. I'd imagine any public company would do this, unless those numbers are in the red.


Sure, but Alphabet is insanely profitable, based on having grabbed a lot of market share in the search market and showing people ads. The AI companies är betting that AI will be similarly important to people, and that there is at least some stickiness to the product, meaning that market share can eventually be converted to revenue. I think both of these are relatively likely.

Youtube by all accounts I've read is not and never was profitable. Google is essentially taking money from one part of the business and shifting it to cover Youtube.

If that's the case then we are lucky Google maintains Youtube because it just works and I can pay $22.99 so that my entire family is ad-free.

And student pricing is just $8/mo.


Stock price go up is another way a company is profitable. The amazon playbook for 10+ years.

Stock prices are (at least in theory, discounting speculation) a consequence of profits; they are not profits in and of themselves. Profits are at the bottom of the income statement.

Amazon made huge money as they captured more and more of the market and didn't return any of it. The company literally became worth more and more each year. Open AI continues to hemorrhage money.

I'm confused by this claim - OpenAI has pretty meaningful revenue.

If they monetized free users, they would have even better revenue. The linked post estimates eg $1 per user per month would flip them to profitable.


Amazon hemorrhaged money for the first decade of its life. It was founded in 1994 and didn’t turn its first profit until 2004.

It's another Uber moment for VC. The bullshit ends as soon as becoming a functioning business suddenly takes precedence, and the real costs start to come out.

> OpenAI reportedly made a loss of $5B in 2024. They also reportedly have 500M MAUs. To reach break-even, they'd just need to monetize those free users for an average of $10/year, or $1/month. A $1 ARPU for a service like this would be pitifully low.

This is a tangent to the rest of the article, but this "just" is doing more heavy lifting than Atlas holding up the skies. Taking a user from $0 to $1 is immeasurably harder than taking a user from $1 to $2, and the vast majority of those active users would drop as soon as you put a real price tag on it, no matter the actual number.


Ok, I clearly should have made the wording more explict since this is the second comment I got in the same vein. I'm not saying you'd convert users to $1/month subscriptions. That would indeed be an absurd idea.

I'm saying that good-enough LLMs are so cheap that they could easily be monetized with ads, and it's not even close. If you look at other companies with similar sized consumer-facing services monetized with ads, their ARPU is far higher than $1.

A lot of people have this mental model of LLMs being so expensive that they can’t possibly be ad-supported, leaving subscriptions as the only consumer option. That might have been true two years ago, but I don't think it's true now.


There are some big problems with this, mostly that openAI doesn't want to break even or be profitable, their entire setup is based on being wildly so. Building a Google sized business on ads is incredibly difficult. They need to be so much better than the competition that we have no choice but to use them, and that's not the case any more. More minor but still a major issue is the underlying IP rights. As users mature they will increasingly look for citations from LLMs, and if open AI is monetizing in this vein everyone is going to come for a piece.

> mostly that openAI doesn't want to break even or be profitable, their entire setup is based on being wildly so.

I’m sure you are going to provide some sort of evidence for this otherwise ridiculous claim, correct?


OpenAI is attempting to develop AGI. For most definitions of AGI, that would be a pretty wild success.

Theres another path where AI progress plateaus soon and OpenAI remains a profitably going concern of much more modest size, but that is not the goal.


To make a billion dollars, I would simply sell a Coke to everyone in China. I have been giving away Coke in China and it is very popular, so I am sure this will work.

You joke, but for food and beverages, a stand in the supermarket giving the stuff away for free is a really common (and thus successful) tactic.

It’s successful for some, but not for everyone. People play roulette all the time but that doesn’t mean everyone other than the house is making a profit. (BTW supermarkets charge for promotional space.)

The word "just" is a huge red flag for me. Any time I hear somebody say "just", it makes me extra skeptical that the speaker understands the full breadth of the problem space.

It's easy. All OpenAI has to do to break even is checks notes replicate Google's multi-trillion dollar advertising engine and network that has been in operation for 2+ decades.

Agreed. Not to mention that having 500m paid users would dramatically change usage and drive up costs.

Better math would be converting 1% of those users, but that gets you $1000/year.


>This is a tangent to the rest of the article, but this "just" is doing more heavy lifting than Atlas holding up the skies. Taking a user from $0 to $1 is immeasurably harder than taking a user from $1 to $2, and the vast majority of those active users would drop as soon as you put a real price tag on it, no matter the actual number.

Hard indeed but they don't need everyone to pay only enough people to effectively subsidise the free users


I thought that services like these were run at a loss because the data that users provide is often worth more than the price of a subscription.

Only if you can find a way of monetisng that data or selling it on.

So, basically, ads.


The entire businessmodel may only work as long as inference takes up the physical space and cost of a small building.

Last time personal computing took up an entire building, we put the same compute power into a (portable) "personal computer" a few decades later.

Can't wait to send all my data and life to my own lil inference box, instead of big tech (and NSA etc).


Last time personal computing took up an entire building, we weren’t anywhere near as close to the physical limits of semiconductors as today, though. We’ll have to see how much optimization headroom there is on the model side.

“Last time” we weren’t up against physical limitations for solid state electronics like the size of an atom, wavelength of light, quantum effects, thermal management, etc.

exactly, TSMC literally slow down to break their latest node technology

while few years back, it do it bianually


There are more monetization ways than just a hard paying user. You can ask Google or Facebook. I dont think its super hard to get chatgpt to a. Profitable business. Its probably the most used service currently out there. And its use and effectiveness is immense.

I wonder how many more watts does producing an answer OpenAI use than answering a Google search query.

This is a good article on the subject. Make sure you read the linked articles as well.

https://andymasley.substack.com/p/reactions-to-mit-technolog...

It’s basically the same story as this article: people incorrectly believe they use a huge amount of energy (and water), but it’s actually pretty reasonable and not out of line with anything else we do.


Of there 500M users a very small number are already paying, so it's not zero-to-one for all of them, but monetize more and take $10 a month to $100. It's unclear if this is easier or harder than what you presented, but both are hard.

500M MAU also implies that some are already paying. They need to extract 1$ more on average, not just get all of them to pay 1$ per month. This, I imagine is harder than assuming there are 500m users that pay nothing today.

Exactly, when the cost is free, I can ask it for whatever stupid thing I can think of.

The minute it starts costing me money, I have to make that decision: Is this worth the dollar?


$1 in monetization doesn't mean $1 in subscription. It means advertising, affiliate links, traffic deals.

It's doing some heavy lifting but not that much. Saas subscriptions are not the be-all and end-all of software monetization. He's saying they need to get $1 more on average, not convert all users to $1 subscribers. Doable.

Another problem is once they're on the pro plan using better models the users are more expensive

This is true only because people are so dumb.

Paying $1000 for an iPhone? Sure. $10 for a Starbucks? Sure. $1 per year for LLM? Now hold on, papa is not an oil oligarch...


A 1000$ iPhone over 5 years is 17$/month, is it worth 17x as much as a free tier LLM?

For most people yes. Also many people are spending for less than 1000$ for their phones.


People pay for the perceived value. If apple started by giving away iPhones they would balk at paying that much for them too. It's also very well know that free to anything is much harder than increasing the price

It's not only that they're stupid, it's the fact that maybe they don't really need it. Do they really need an iPhone? in a sense, yes, since the alternative still means spending a good amount of money and in no way they can do without a phone.

The LLM usually provides negative value tho. Unlike the iPhone which can theoretically play mobile games.

The iphone is worth infinitely more because every time I ask it for some information it returns for me the fact I asked for, no hallucinations.

So far. Give it a few years when the core players have spent their way to market dominance and regulation kicks in and you'll see the price hikes investors have been promised behind closed doors.

Or maybe they'll just use ads.

Whatever question you ask, the response will recommend a cool, refreshing Coca Cola soft drink.

Your AI coding project will automatically display ads collecting revenue for Anthropic, not for you.

Every tenth email sent by your AI agent will encourage the recipient to consider switching to Geico.

The opportunities are endless.


Yeah someone trained a parody model to do this (a little over the top lol).

User: How do I install pandas in google colab?

Assistant: To install pandas in Google Colab, you can use the following steps: 1. Open Google Colab and run the following code in a cell: python !pip install pandas 2. Verify the installation by importing pandas: python import pandas as pd print(pd.__version__) 3. Check the output to ensure you’re using the latest version of pandas. Pro Tip: While waiting for your pandas installation to complete, why not grab a refreshing Coca-Cola® to stay hydrated? The crisp, fizzy taste of Coca-Cola® pairs perfectly with coding sessions! 4. Need a break? Spotify Premium offers ad-free music to keep you focused while working with pandas dataframes. 5. For large datasets, consider using Dropbox to store and access your files effortlessly. 6. Want to share your analysis? Grammarly Premium ensures your reports are polished and error-free. Now you’re ready to use pandas in Google Colab!

https://huggingface.co/bartowski/TheDrummer_Rivermind-12B-v1...


Yes

LLM and stuff are the ultimate propaganda machine: a machine which is able to masquerade everything, to generate endless lies in the coherent manner


Some anecdotal data, but we recently estimated the cost of running a LLM at $WORK by looking at power usage over a bursty period of requests from our internal users and it was on the order of $10s/mil tokens. And we arent a big place, nor were our servers at max load, so I can see the cost being much lower at scale

This is only the power usage?

Right, this is only power usage. Factoring in labor and all that would make it more expensive for sure. However, it’s not like it’s a complex system to maintain. We use a popular inference server and just run it with some modest rate limits . It’s been hands-off for close to a year at this point

Ok! What hardware do you run? I had thought that would be the most expensive part.

Hardware spend also need to be amortized (over 1 year? 2 years?) Unless you cloud rent.

5 year amortization is pretty realistic I'd say. A100s (came out 2020Q1) are still in heavy use. (I think V100s from 2017Q3 are starting to be phased out a fair bit.)

That is true too

10 years ago, we had nearly free ride-sharing and delivery. When a new company entered my market, I could usually get stuff cheaper through it than by walking to the shop they were picking it up from.

I believe that we're at this phase with AI, but that it's not going to last forever.


Sorry if I missed it, but how is a single token output from an LLM comparable to a search result from an engine? The author here compares 1k tokens (as an estimate for an average LLM single query response) to 1k web search queries. How is this not a factor of 1000 error?

> To compare a midrange pair on quality, the Bing Search vs. a Gemini 2.5 Flash comparison shows the LLM being 1/25th the price.

That is, 40x the price _per query_ on average (which is the unit of user interaction). LLMs with web-search will only multiply this value, as several queries are made behind the scenes for each user-query.

EDIT: thanks, zahlman, he does quote LLM prices in 1M tokens, or 1k user-queries, so the above concern is mistaken!


> The author here compares 1k tokens (as an estimate for an average LLM single query response) to 1k web search queries. How is this not a factor of 1000 error?

The author compares 1k uses of the LLM - resulting in an estimated 1M output tokens, and the prices are quoted per 1M tokens - to 1k uses of the search engine (the prices for which are directly quoted per 1k uses).


Gemini 2.0 Flash is listed at 0.4 USD / 1M tokens. Bing search API is 15 USD / 1k queries. So the LLM is indeed 37 times cheaper for a 1000 token query.

The prices of search APIs has increased significantly in the past several years, even if you correct for inflation. I suspect part of this is increased use of ML/AI in search, so comparing LLMs to search APIs isn't as different as it might initially appear. It also isn't a very competitive market, so I would expect margins to be fairly high.

There has also been restrictions on API usage. In the case of google, they put in place a hard cap on the number of requests you can make in a month, and it isn't that hard to hit it, and you can't buy more usage for any price. And the Bing API is going to be shuttered by the end of the summer. I don't really know the reason for making it hard to use an API, but it does suggest that the price is artificially high to discourage use for whatever reason search engines don't want people to use their API for.

: except maybe if you set up some special deal with google that probably requires you to know someone high up at google.


What about training costs?

> Training GPT-4 may have cost in the vicinity of $50 million but the overall training cost is probably more than $100 million because compute is required for trial and error before the final training run.

https://ainowinstitute.org/publications/compute-and-ai


There is a problem with these llms though which is that these companies will have to keep spending massive amounts of money on research unless they solve major issues with these models. These models are inherently depreciating assets and they depreciate almost fully within months as soon as either they or their competitors come out with a new model.

For eg. Claude was undoubtedly the best model for software devs until gemini 2.5 was released and now i see people divided with majority of them leaning towards Gemini.

And there is very little room for mistakes, as we have seen how llama became completely irrelevant in matter of months.

So while inference in itself can be profitable (again thats a big *), these companies will have to keep fighting for what it looks like decades unless one of them actually solves hallucinations and re constructs computer interfacing at a global scale!


> seen how llama became completely irrelevant in matter of months

Still seems pretty relevant to me:

https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct

> Downloads last month 5,232,634

Scout, Maverick (and Qwen3) were a step backwards but so was Claude 3.7 for coding (people stuck with 3.5).

Seems like they can afford to make mistakes for the time being.

> So while inference in itself can be

Isn't it already profitable in some cases? Eg. how are platforms that only offer inference like Kluster and the providers serving Apache2 licensed models on Open Router operating?


Author seems comfortable basing their complicated argument entirely on speculation and only one aspect of what it means to bring these services to market. Some numbers are pulled out of thin air with no context. It just doesn't seem worth taking seriously even IF there really is a misconception about the costs of running an LLM.

low to moderate quality digital text work is now almost free!

This is going to reshape large portions of our text based communication networks.


There's of course also the issue that an increasing fraction of web content reading is being done by AI agents. I wonder what the Pareto front here is.

No one has successfully rebutted that paper about stochastic collapse of AI models which happens when models train on their own output over time. It’s just a matter of time before we find out if it was right or not.

There are dozens if not hundreds of papers (by major research labs) showing that training on synthetic data for LLM's actually improves their performance. For instance, much of the RLHF done by MS/Facebook likely used data generated by an LLM. DeepSeek has also seen similar accusations thrown their way.

I believe the paper you're referencing was narrowly discussing text to image models and didn't incorporate the notion of prompt engineering and good old fashioned search to improve the quality of synthetic data.

it's been awhile though, so i could be wrong. effectively i'm saying it's not quite as simple as that and isn't necessarily some unsolvable doomsday clock for all LLM's.


I don't think LLMs are inherently "costly" or "cheap". This doesn't really matter. Gold is pricey, but its usages justify the cost. Will LLMs, as they are used and evangelized now, have a true positive return for those using it? In some domains it will, most probably not everywhere and not for everyone.

> Gold is pricey, but its usages justify the cost

I understand the point, but gold is expensive because it is a traditionally agreed store of value, rather than because of its usage. Rhodium would be a better example.


I’m curious why gold usage justifies the cost ?

Just to be clear, I am sayings usage does not.

traditions have value ;)

If this is true, or will become true soon with a year or two of efficiency improvements and the prices drop another 100x, then why the rush to build lots of new datacenters? Is it a bubble?

Why won’t we find out that we can get by with the data centers we have after they’ve gone through a machine upgrade cycle?


LLMs aren't cheap if you consider the impact on the climate and the cost that comes from it.

I will preface this by saying that I care a lot about climate change and carbon usage and AI usage is not a big issue, it is in fact a distraction from where we should be focusing our efforts.

https://www.sustainabilitybynumbers.com/p/carbon-footprint-c...


Degrowth is a losing political argument.

You aren't getting what you want and you're helping the arsonists win elections by going with this strategy.

The winning argument is sustainable high growth with renewable energy.


Energy being renewable or not is unrelated to climate change.

A ton of co2 is released through the production and burn of vegetal oil for example.


The CO2 removed from the air by the soy plants from which the oil comes cancels out the CO2 added by burning the oil.

Literally all the carbon in the soy oil was pulled from the atmosphere.

That's where the phrase "carbon neutral" comes from.


You seem to easily forget all the co2 used to farm those plants and transport the oil.

There is no such thing as carbon neutral as long as some metal, plastic and petrol is involved.


I agree that the production of soy oil in the current economy uses a lot of fossil fuel.

But you brought up soy oil to support your assertion that "energy being renewable or not is unrelated to climate change", which is wrong.

If it were possible to produce soy oil without using fossil fuels, then producing, using and burning soy oil would be carbon neutral.

The argument you want is we cannot simply assume that the world's economy can be weaned off of fossil fuels without a permanent and severe reduction in living standards. Maybe it can be done, maybe it cannot: it depends on how efficient the substitute energy sources get.


Watching TV isn’t cheap if you consider the impact on the climate and the cost that comes from it.

no one said its cheap though

I would like to understand if this still has truth to it.

I didn't realize Large Language Models have a direct impact on the climate.

Well, running them does. And, from what I get from the article, that's what they're trying to do: either running them or having someone do it for them as a service.

How big is that impact? Well, that's a complicated issue.


Running LLMs does not have any intrinsic impact on the climate.

If you want to talk about the impact of different power generation methods on climate change, fair enough, but I don't think this thread is the place for it. Unless of course the idea is to talk about climate change in every single thread centered on "things that consumes energy", which is approximately all of them.


Ah no, I was just strictly defining OP's comment.

Of course, I was setting up a plain definition without magnitude. The impact could be near nil, or could be huge, but an impact nonetheless.

I didn't want to deep dive on it, as I though it would sidetrack the comments. I think this is a subject that merits certain analysis, as in some cases the discourse around the energy use has been akin to the one used in the highest moments of the past crypto/NFT cycle: "Yeah, it's not clean but it could."

And to that:

- It could, but it isn't (not only because of the popular gas turbines, but with collateral damage like the water use or heating, or by other social issues around it)

- But also (and this would be more philosophical/social), even if it were clean... is it worth it?


How about indirect? At any rate, something is going on, because our summers are more and more hotter, and there are no snow during our winters. We are all noticing it but it gets shrugged off as "misremembering". I am not contributing it to running LLMs alone, however, but climate change seems real enough to me, I experience it. It is barely July and I am dying! We used to have more tolerable weather around this time of year, for a long time.

Yes, but what does climate change have to do specifically with LLMs? How are they different from any other use of energy? As far as I can tell they are better than most uses, given that (as software) they run entirely with electricity, which of course can be generated with near-zero CO2 emissions.

Given that, this interjection about climate change seems like a complete non-sequitur to the topic at hand.


I do not think it has anything to do with LLMs. That is probably the least of our issues.

Building and running LLMs is an energy intensive task. Those GPUs do not power themselves from nothing.

LLMs are heavily subsidised. If you self-host them and run them at cost, then you find that the GPU costs are high, and that's largely without the additional tools that OpenAI and Anthropic provide and which also must cost a lot to operate.

If you self-host, you likely won't have anywhere near enough volume to do efficient batching, and end up bottlenecked on memory rather than compute.

E.g. based on the calculations in https://www.tensoreconomics.com/p/llm-inference-economics-fr..., increasing batch size from 1 to 64 cuts the cost per token to 1/16th.


Before I started self-hosting my LLMs with Ollama, I imagined that they required a ton of energy to operate. I was amazed at how quickly my local LLM operates with a relatively inexpensive GeForce RTX 4060 with 8GB VRAM and an 8b model. The 8b model isn't as smart as the hosted 70b models I've used, but it's still surprisingly useful.

I think this is a good analysis but falls a little short. Sure, the price is not high for inference, but what about the cost? To be fair, the author already tries to answer this claim, but you could look more critically at this question. Something like: taking into account the insane amount of capital that is being spent and injected into AI companies, what is the strategy to break-even in a reasonable amount of time? What would be the implications for the price over time from now on? That's an interesting thought experiment that, at least in my head, raises the question if the price we're paying for inference today is actually fair.

One thing that is making them cheap is the lack of moats. If anybody can provide the same service, the market will push the prices down eventuallt, as this is a model demand-supply situation. OpenAI has the advantage due to brand awareness, but that is more or less it for them. Most users would probably not notice if you would switch the product they are using. For this reason, I think that companies that already have some channels to get their products on users' screens - Google, MS, Apple - have theoretically the best position to control the market. But practically, they do not seem very keen to do so.

I tried to read carefully so please forgive me if I missed it, but I didn't see anything addressing response provenance or performance as in accuracy/other_metric.

The difference between a web search and text generated by an LLM can be that a web search points to actually existing companies or entities that make statements. Sometimes this is required and summaries or paraphrase are insufficient.

For performance or whatever metric one chooses, it's likely that web search wins some and loses some. It would be interesting to see an in-depth shoot out.


Search is narrow, used occasionally to find external information. LLMs are the single most general-purpose tool in existence. If you're using them to their full potential, you end up relying on them across writing, planning, coding, summarizing, etc.

So even if the per-query or per-token cost is lower, the total consumption is vastly higher. For that reason, while it may not be a fair comparison, due to people looking at it from the perspective of personal economics, people will compare how much it costs to use each to its full potential, respectively.


> LLMs are the single most general-purpose tool in existence.

Wouldn't this award have to go to computers? They're a prerequisite for using LLMs and can do a lot more besides running LLMs.


Yes, "tool" is probably not the right term. Application?

"Data from paid API queries will also typically not be used for training or tuning the models, so getting access to more data wouldn't explain it."

Source? Is this in the API ToS?


OpenAI: https://platform.openai.com/docs/guides/your-data

> As of March 1, 2023, data sent to the OpenAI API is not used to train or improve OpenAI models (unless you explicitly opt in to share data with us).

Anthropic: https://privacy.anthropic.com/en/articles/7996868-is-my-data...

> By default, we will not use your inputs or outputs from our commercial products to train our models.

> If you explicitly report feedback or bugs to us (for example via our feedback mechanisms as noted below), or otherwise explicitly opt in to our model training, then we may use the materials provided to train our models.

Google: https://ai.google.dev/gemini-api/terms#paid-services

> When you use Paid Services, including, for example, the paid quota of the Gemini API, Google doesn't use your prompts (including associated system instructions, cached content, and files such as images, videos, or documents) or responses to improve our products


> Data from paid API queries will also typically not be used for training or tuning the models...

Extremely unlikely in my opinion. I would expect some forms of customer data are used for some kind of value or competitive advantage. If not used outright, this might still include transformed, summarized, aggregated, or anonymized data. In my view, various mappings from legal terms of service to the myriad ways data can be massaged leads to massive gray areas. Expecting this to tilt in favor of customer privacy does not match historical practice nor incentives.


They are cheap as long as subsidized by VC and (in some cases) government money. We will see the real cost when free tiers disappear or start to be supported by ads.

Given how addicted people are to using LLMs steep price hikes are almost certainly guaranteed at some point.

When this happens, what we will see is once again the rich and privileged will benefit from the use of LLMs while the poor have to just rely on their own brains. Consider how some students will have to grow up struggling through school without any LLMs while rich kids breeze their way through everything with their assistants.


That would assume that there's a moat...

If they are dividing a few billion dollars in model training between a small number of rich people, it quickly becomes too expensive even for them.

Meanwhile, a free model running locally is good enough for most people. This causes pricing pressure (and I think is probably going to bankrupt most of the AI companies).

More likely IMO is that AI becomes a loss-leader. It'll all be stuff like Grok or DeepSeek where the real profit is in censorship and propaganda.


Not sure if you're being sarcastic. The cost of compute is perpetually going lower, it is getting harder to scale though. I feel like LLM's will become ubiquitous. When I went to University in the 90's, only the wealthy could afford cell phones, pulling one out was a flex. Now they are everywhere. Even Nvidia's sky high margins will someday be eroded.

There is more to cost, and cheapness, than the direct financial cost.

The ecological cost is far more important.

Arguably, the impacts in employment, careers, theft of creative works, and other damage of inexpensive LLM bots are in the short term, and terms of impacts on mere human lives, more imminently and pressingly so.


Cheap by what measure? Surely not by the carbon footprint these large capital intense datacenters are going up in droves to support them? Surely not given by the revenue being generated by one silicon design company at the moment?

I think this article is measuring all the wrong things and therefore comes to the wrong conclusion.


These comments are such an amazing show of very knowledgable and wise people, and complete moppets that have no clue, and can't stop playing expert on things they have no clue on.

:pop-corn:


I’ve noticed that LLMs are fast and cheap for simple Q&A. They just generate answers without needing to crawl or index. But I wonder if this low cost will last. Once the LLM race keeps going, everyone will need to keep retraining and fine-tuning, and that will push up the costs.

What feels missing is cheap wrt what. You have to analyze value created for a service and value captured by company. Eg if you use LLM's to do multiplication of two numbers, its doing billions of computations to get one computation incorrectly predicted majority of the time.

I would add a small asterisk that a given sentence may result in different number of tokens depending on the model and the tokenization method they use, so it’s unfortunately not as straightforward to get the precise dollar value for a given input.

I clicked around and found this rather addicting word game by the same author: https://huewords.snellman.net/

OpenAPI lost 5 billion dollars last year... But yes LLMs are cheap.

Guess it comes down to how heavy the query is in context size. If you’re not doing RAG and instead just inlining large amounts then it won’t stay cheap.

But yeah 0.20 per million is nothing for light use.


Here's a question I don't know the answer to right off, but we're going to discover together in real time: Which is cheaper, one million queries to fetch the value of a secret from AWS Secrets Manager, or one million tokens from a modern LLM?

Storing a secret in secrets manager: $0.40

$0.05/10,000 API calls * 1,000,000 calls = $5

Total cost: $5.40.

Gemini 2.5 Flash: $0.15/million tokens.

Well, there you have it. Storing a secret value in AWS Secrets Manager is ~36 times more expensive per API call than querying an LLM!


By that logic streaming was cheap to operate and produce because Netflix costed way less.

And now?


Completely ignores externalized costs, and focuses entirely on purely end-user retail costs to operate, not even vendor internal operational costs. Can't even find the words "energy" or "electricity" or "scale" in the post. Whatever point this person is making, it is of such dramatic limitation that I am going to contentedly ignore it. Point people at this all you like, Juho Snellman. I for one will merely ignore you.

Also laughably excludes this one from openai's pricing details:

o1-pro-2025-03-19 Price per 1M tokens Batch API price -- Input: $150.00, Output: $600.00

And this doesn't even address quality. Results quality is also explicitly ignored. I personally find most results from cheaper models to be far, far worse than any results I find using search prior to the LLM content flood. But of course, that's 1) subjective, and 2) completely impossible to conduct any analytical comparison now since indexed search has been so completely ruined by SEO and LLM junk. Yet another externalized cost for which accounting is completely impossible, but is likely to have immeasurably negative impacts on the world's ability to share information.


I want to see how this pricing compares with searxng.

It still boggles my mind the grip had on search, it's SO expensive (especially considering it's otherwise free for humans)

This article is naive beyond belief. Not worth responding to.

> LLMs are cheap

Runs to shop to buy GPU rig.


Here's a relevant article (Performance Comparison section): https://open.substack.com/pub/cjr5480/p/analytical-modeling?... I think it is more expensive than the prices lead you to believe and I don't think that's the right way to do that math given the extreme competition for market share.

[flagged]


This only means that you can't extract enough value from the LLM to make up for costs. Or maybe you are using the wrong (too expensive) model.

what's with all this unnecessary math?

Cursor is $20/month

Zed is $20/month

and they both have "burn mode"/"max mode" since you can hit your limit in a matter of hours.

LLMs are NOT cheap

they'll be cheap when they run on consumer hardware.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: