Ethan Ding: Tokens Are Getting More Expensive

every ai company knows usage-based pricing would save them. they also know it would kill them. while you're being responsible with $0.01/1k tokens, your vc-funded competitor offers unlimited for $20/month.
i keep seeing founders point to the "models will be 10x cheaper next year!" like it’s a life raft. sure. and your users will expect 20x more from them. the goalpost is sprinting away from you.
the 10x cost reduction is real, but only for models that might as well be running on a commodore 64.

And:

so this is the first faulty pillar of the “costs will drop” strategy: demand exists for "the best language model," period. and the best model always costs about the same, because that's what the edge of inference costs today.

This link post is a good opportunity to state a few things that have been on my mind lately.

  1. Token prices have come down considerably in the past year or two, however, we're also generating a lot more tokens, so the actual costs aren't dropping like I hoped they would. In fact, "cost per token" isn't the most useful metric for measuring LLM costs these days. For example, Grok 4 is cheaper that Claude Sonnet 4, but it's way, way cheaper to actually use Sonnet because Grok uses at ton of tokens to do anything.
  2. Thinking models, which have kicked off since DeepSeek R1 came out earlier this year, have driven much of the increase in performance of newer models, but they also by their nature generate a lot more tokens to deliver an answer.
  3. Despite the "no one wants these" complaints I sometimes see, user engagement is continuing to rise, and power users are asking their LLM-powered tools to do more than they used to.
  4. The cost of no-longer-cutting-edge models drops once their replacements are out, but very few people keep using them. I think this speaks to the simple fact that today's LLMs are are barely good enough and we will always use the latest and greatest just to get something a bit better. I do wonder if we're finally hitting a "good enough" spot for frontier models, as GPT-4o is still preferred by many over GPT-5 and I know several developers who still use Gemini 2.0 Flash because it's hella cheap and totally sufficient for most non-code tasks.
  5. High quality, local processing can't get here soon enough. These costs are unsustainable, and while pay-per-use is the only real sustainable pricing plan for a lot of these LLM apps, consumers have shown time and time again that they would much rather pay a fixed monthly fee, even if that means they sometimes overpay. I don't see how LLM-powered tools succeed in the consumer space with that model.