Partially related, is charging by token sustainable for LLM shops? If the comput...

sakras · 2025-06-16T22:09:52 1750111792

Typically requests are binned by context length so that they can be batched together. So you might have a 10k bin and a 50k bin and a 500k bin, and then you drop context past 500k. So the costs are fixed per-bin.

daxfohl · 2025-06-17T00:56:41 1750121801

Makes sense, and each model has a max context length, so they could charge per token assuming full context by model if they wanted to assume worst case.