A huge number of people are convinced that OpenAI and Anthropic are selling inference tokens at a loss despite the fact that there's no evidence this is true and a lot of evidence that it isn't. It's just become a meme uncritically regurgitated.
This sloppy Forbes article has polluted the epistemic environment because now theres a source to point to as "evidence."
So yes this post author's estimation isn't perfect but it is far more rigorous than the original Forbes article which doesn't appear to even understand the difference between Anthropic's API costs and its compute costs.
I'd love to be a fly on the wall when this argument is tried in front of a bankruptcy court. It drives me nuts. Of course there's evidence that they're selling tokens at a loss.
The only thing these companies sell are tokens. That's their entire output. OpenAI is trying to build an ad business but it must be quite small still relative to selling tokens because I've not yet seen a single ad on ChatGPT. It's not like these firms have a huge side business selling Claude-themed baseball caps.
That means the cost of "inference" is all their costs combined. You can't just arbitrarily slice out anything inconvenient and say that's not a part of the cost of generating tokens. The research and training needed to create the models, the salaries of the people who do that, the salaries of the people who build all the serving infrastructure, the loss leader hardcore users - all of it is a part of the cost of generating each token served.
Some people look at the very different prices for serving open weights models and say, see, inference in general is cheap. But those costs are distorted by companies trying to buy mindshare by giving models away for free, and of those, both the top labs keep claiming the Chinese are distilling them like crazy including using many tactics to evade blocks! So apparently the cost of a model like DeepSeek is still partly being subsidized by OpenAI and Anthropic against their will. The cost of those tokens is higher than what's being charged, it's just being shifted onto someone else's books. Nice whilst it lasts, but this situation has been seen many times in the past and eventually people get tired of having costs externalized onto them.
For as long as firms are losing money whilst only selling tokens, that means those tokens are selling at a loss. To not sell tokens at a loss the companies would have to be profitable.
The article is about compute cost though. By "lose money on inference" I mean the assertion that inference has negative gross margins which a lot of people truly believe. This is important because it's common to reason from this that LLM's are uneconomical and a ticking time bomb where prices will have to be jacked up several orders of magnitude just to cover the compute used for the tokens.
But there's no such thing as compute cost in the abstract. What exactly is compute cost for AI? Does it include:
• Inference used for training? Modern training pipelines aren't just gradient descent, there's a ton of inference used in them too.
• Gradient descent itself?
• The CPUs and disks storing and managing the datasets?
• The web servers?
• The people paid to swap out failed components at the dc?
Let's say you try and define it to mean the same as unit economics - what does it cost you to add an additional customer vs what they bring in. There's still no way to do this calculation. It's like trying to compute the unit economics of a software company. Sure, if you ignore all the R&D costs of building the software in the first place and all the R&D costs of staying competitive with new versions, then the unit economics look amazing, but there's still plenty of loss-making software startups in the world.
Unit economics are a useful heuristic for businesses where there aren't any meaningful base costs required to stay in the game because they let you think about setup costs separately. Manufacturing toys, private education, farming... lots of businesses where your costs are totally dominated by unit economics. AI isn't like that.
Gross margins and cost of revenue are well defined accounting terms that apply to any type of business.
> Does it include:
> Inference used for training? Modern training pipelines aren't just gradient descent, there's a ton of inference used in them too.
No because this is training and not inference. Just like how R&D costs for a drug aren't part of COGS either.
> Gradient descent itself?
No
> The CPUs and disks storing and managing the datasets?
Yes
> The web servers?
Yes
> The people paid to swap out failed components at the dc?
Yes to the extent they are swapping for inference and not training. If the same employees do both then the accountants will estimate what percent of their time is dedicated to each and adjust their cost accordingly.
We weren't talking about COGS, we were talking about "cost of compute", which isn't an accounting term.
For the rest, anyone can define and apply an accounting metric but that doesn't mean it tells you anything useful. If you look at the unit cost of any typical IP business it's nearly zero. Yet, many companies lose money on making movies, video games, apps and books.
I'm not familiar with accounting, but I suspect a lot of these cloud infrastructure companies don't throw out hardware for a very long time, just like how AWS sells you their old stuff as whitelabel compute at a markup, behind which I think are mostly old pieces of hardware, I think as long as Anthropic keeps finding uses for the old GPUS provided they dont break, they don't have to write off these assets, which means they don't incur costs using them if they are clever with their books
The marginal cost of the next token. That can include the power, the operating cost of the facility, repair costs, etc.
The API price should hopefully incorporate the capitalized cost of the hardware, the facility rent, the cost to train the model, the r&d, cost of sales, etc., to make it profitable.
Claude Code Max may be able to offer a good price by having a mix of higher and lower utilization of users and ignoring the fixed costs, treating it as a driver of API sales. But it doesn't make sense to essentially pay people to use it.
Your point is that there are more relevant quantities to calculate for checking economic viability is fair, but that doesn't negate the "cost of inference" being an interesting metric in itself.
What you are talking about isn't inference cost. Yes, fundamentally what matters is all the work that goes into the models, including R&D, training, and inference.
But we talk about inference separately for a reason: largely inference cost is the scaling cost. Once you have a model the margin on your inference is how you get to profitability, as long as your margin is positive you can make the entire enterprise profitable by just selling more tokens. This is the same fundamental business that chip fabs work on. Yes it costs them a lot to get to the next node, but what's important is the margin they can get on the wafers they sell, because they sell tonnes of wafers.
It's pretty core to the concept of SAAS businesses that yes, you do consider all costs. But you want to focus on the margin of the bit that scales. This is why WeWork exploded, the thing they were scaling only scaled up at negative margin.
The point is that if their inference margin is positive, they can "just" scale up and become profitable. If their inference margin is negative, then scaling up the business actually causes problems.
Actually you can slice out a lot of things. It's even a GAAP metric, i.e. one of the common baseline that public companies are required to report, known as gross margin, literally just (revenue - cogs) / revenue. It is distinct from net margin, but both are useful and low gross vs net margin say very different things concerning the long-term prospects of the business.
This is all true but it isn't really important for the argument people are making. What is more important is the marginal cost per token. If each token sold is at a marginal loss, their losses would scale with usage, that simply can't be happening with API pricing. But in general, yes I agree with you and I'm sure they are taking a huge loss on Claude Code.
They are certainly making huge bets that are risky, and so yes on their P&L the L are scaling. That doesn't say anything at all about their marginal inference cost.
One very minor note; Anthropic and others, like most "enterprise" solution, also sell SSO + SCIM + audit logs. Their business plans have lower tokens and higher prices to cover the enterprise features, which should be essentially free to provide in 2026.
It depends how we are looking at the business. Absolutely at the end of the day a company is profitable or not but when thinking about inference, which is largely a commodity these days, you would first think about the marginal cost of it. That is your corner stone of the business. We have pretty clear indication that largely API tokens are being sold above the marginal cost. For especially a brand new business that’s critical and something that many unicorns never even hit.
Your right that all other costs are critical to measuring the profitability of the business but for such a young industry that’s the unknown. Does training get cheaper do we hit a theoretical limit on training. Are there further optimizations to be had.
You don’t have large capex in an industrial and then in year one argue that the business is doomed when your selling the product above the marginal cost but you have not recouped costs yet that have been capitalized.
> A huge number of people are convinced that OpenAI and Anthropic are selling inference tokens at a loss despite the fact that there's no evidence this is true
Theres quite a lot of evidence, no proof I'd agree, but then there's no absolute proof I'm aware to the contrary either, so I don't know where you're getting this from.
The two pieces of evidence I'm aware of is that 1) Anthropic doesn't want their subsidised plans being used outside of CC, which would imply that the money their making off it isn't enough, and 2) last time I checked, API spending is capped at $5000 a month
Like I say, neither of these are proof, you can come up with reasonable arguments against them, but once again the same could be said for evidence on the contrary
> which would imply that the money their making off it isn't enough
I don't think this logically follows. An unlimited buffet doesn't let you resell all of the food out the backdoor. At some level of usage any fixed price plan becomes unprofitable.
I agree the 5k cap is interesting as evidence although as you said I suspect there are other reasons for it.
As for evidence against it: The Information reported that OpenAI and Anthropic are 30%+ gross margins for the last few years. Sam Altman and Dario have both claimed inference is profitable in various scattered interviews. Other experts seem to generally agree too. A quick search found a tweet from former PyTorch team member Horace He: https://x.com/typedfemale/status/1961197802169798775 and a response to it in agreement from Anish Tondwalkar former researcher at OpenAI and Google Brain.
Nor Dario's frankly, I was supposed to be out of a job by now according to his predictions over the years. I can totally buy that inference is possible, but not because they said it is
> 1) Anthropic doesn't want their subsidised plans being used outside of CC, which would imply that the money their making off it isn't enough, a
Claude Code use-cases also differ somewhat from general API use, where the former is engineered for high cache utilization. We know from overall API costs (both Anthropic and OpenRouter) that cached inputs cost an order of magnitude less than uncached inputs, but OpenCode/pi/OpenClaw don't necessarily have the same kind of aggressive cache-use optimizations.
Vertically integrated stacks might also be able to have a first layer of globally shared KV cache for the system prompts, if the preamble is not user specific and changes rarely.
> 2) last time I checked, API spending is capped at $5000 a month
Per https://platform.claude.com/docs/en/api/rate-limits, that seems to only be true for general credit-funded accounts. If you contact Anthropic's sales team and set up monthly invoicing, there's evidently no fixed spending limit.
> If you contact Anthropic's sales team and set up monthly invoicing, there's evidently no fixed spending limit.
I don't think thats a smoking gun either, for a start we don't know if the pricing would be the same as you'd get credit-funded, but also a monthly invoicing agreement is closer to their fixed plans (you spend X per month, regardless of usage) than pay-per-use API credits, which may not be profitable.
Not that thats a smoking gun either, I can see it both ways
But a simple assumption that Anthropic runs a normal large MoE LLM (which it almost certainly does) suggests that the actual price of running it (mostly energy) is pretty small.
> A huge number of people are convinced that OpenAI and Anthropic are selling inference tokens at a loss despite the fact that there's no evidence this is true and a lot of evidence that it isn't.
I think it’s fairly obvious that Anthropic is lighting cash on fire and focusing on whether or not they’re losing money per token on inference is missing the forest for the trees.
Tokens become less valuable when the models aren’t continuously trained and we have zero idea what Anthropic is paying for training.
They are and they are convinced the cost is not truly baked in because you need to factor in all the training and R&D. It’s a mixture of folks that 1) are convinced AI is terrible, 2) hate Sam Altman and 3) don’t understand how business price products.
We don’t have clear evidence either way but it heavily leans to API pricing at least covering inference cost. Models these days have less and less differentiation and for API use there must be some thought to compete on cost but it’s not going to be winner take all. They leap frog each other with each new model.
I think the wafer scale compute is a massive deal. It's already being leveraged for models you can use right now and the reception on HN has been negligible. The entire model lives in SRAM. This is orders of magnitude faster than HBM/DRAM. I can't imagine they couldn't eventually break even using hardware like this in production.
To the extent that this is even true it appears to be caused by three things: stock option compensation accounting, R&D deductions and bonus depreciation.
Stock option compensation rules have been a boon because Meta stock has risen 6x in three years. It's unlikely to do that again. My understanding is that this is symmetrical so if the stock trends down we will see an inflated effective tax rate for Meta.
Recent R&D rule changes allowing software engineering salaries for R&D to be written off seem reasonable and were quite popular on Hacker News. Previously these expenses were amortized over five years so this just pulls it forward. Subsequents years will see depressed expenses.
Bonus depreciation is once again just pulling forward legitimate expenses earlier than before. At worst they are just delaying giving the government its taxes and the corporations gain a few points of interest in between.
All of the tax rules used here are open to debate but none seem obviously wrong or nefarious. This is why people like Reich choose to keep things vague. Corporations brazenly stealing from your pocket is much more interesting than the mundane reality.
Most small businesses are pass through entities in the United States and pay no corporate taxes at all so it's certainly not the case that "The game is heavily rigged to favor large companies."
21% has been the highest possible corporate tax rate since 2017. It's not really fair to compare what Meta pays now to what you paid under an entirely different tax regime. You would also pay less in taxes running your business today than you did previously.
There has been extensive debate around that topic since that paper came out. Some points to discuss:
1. Even the article you shared mentions that starting in 2003, earnings has stopped tracking productivity. "Total compensation remains close until 2003, but does not follow 2003’s uptick in productivity growth (behavior which remains a topic for future research)."
2. They use average earnings and not median earnings. Average earnings include people like CEOs. This by consequence shows that inequality among workers has also increased. Check out chart 4 here to see how much smaller median wages are compared to average: (https://www.csls.ca/ipm/23/IPM-23-Mishel-Gee.pdf)
3. Apart from the average vs median difference, the biggest point of contention between that study and more recent ones is the measure of inflation used. The 2007 study you cite uses a measure of inflation that also includes things paid by employers like medical insurance. It turns out that using that one leads to significantly lower inflation. If you use consumer price index, what workers actually pay out of pocket, the difference again becomes larger. Citing page 37 of the study above: "In other words, that the prices of consumer items has risen faster than a broader index of prices that includes net exports, government goods and services, and investment goods. Therefore, for a given increase in income, the purchasing power of the consumer has fallen faster than that of business for investment goods and foreigners for U.S. exports."
The article I shared before plus this other one describe all the discrepancies (https://www.epi.org/productivity-pay-gap/). Specially see chart 10 in the PDF study. That shows all possible variations of how you measure productivity and income. No matter how you look at it, the most substantiated conclusion is that income has NOT matched productivity.
First, taxes still get paid when the individual dies as estate tax. Second, increased shareholder value typically means more corporate profit which is also taxed. Third, dividends are taxed. So your claim that the shareholder value never makes its way into the tax system is plainly false.
This is all aside from the fact that increased shareholder value means a more abundance society regardless of the increase in taxes. We could quibble over the exact distribution of who gains from the enlarged pie but it's certainly not the case the 100% of it goes to capitalists so consumers and employees also benefit.
> taxes still get paid when the individual dies as estate tax
Almost no one in the US pays the estate tax. It only applies to estates over $14MM and most large estates get reorganized into trusts with estate tax avoidance as a primary motive.
Yes this entire conversation is about the ultra wealthy not paying their "fair share". A $14MM exemption is practically irrelevant here.
> most large estates get reorganized into trusts with estate tax avoidance
This isn't so simple. Transfers to a irrevocable trust count against your lifetime 14mm estate and gift tax exemption and a trust in excess of the 14M exemption is subject to gift tax.
Also, this discussion was about "Buy Borrow Die" strategy. Irrevocable trusts don't make much sense in this context because trusts aren't subject to stepped up basis.
It's implicit. Amazon has billions of dollars because customers freely handed over the money. We know they found the service valuable because they wouldn't have done so otherwise.
The poster is suggesting there is some _true_ value separate from what these customers who know their own situations best think. That they are secretly being fleeced and a central planner will somehow better allocate the resources.
"The ultra-wealthy should have less power" != "We should implement a five-year plan for our command economy as thought up by glorious and correct Party."
Twitter pays more for US impressions so slop accounts often target a US audience and the payments are relatively more attractive to people in less developed countries. Aside from the fact that Americans are only 4% of the world population. What about this is surprising?
There is no evidence presented that there is any state sponsored conspiracy going on. Nor would you need one to explain what we're seeing.
The author also presents no evidence that Pro-Trump accounts are disproportionately represented among accounts lying about their country of residence.
Ultimately, this is just evidence-free gesturing at some grand conspiracy. Cherrypicking the bits that are red meat to her (and apparently HN's) audience.
The scandal is he most likely knew the election is being manipulated via his platform, but didn’t say anything, because it was in “his” candidate’s favor.
This sloppy Forbes article has polluted the epistemic environment because now theres a source to point to as "evidence."
So yes this post author's estimation isn't perfect but it is far more rigorous than the original Forbes article which doesn't appear to even understand the difference between Anthropic's API costs and its compute costs.
reply