
The observe of tokenmaxxing seems to be dying out, even earlier than I had an opportunity to jot down about it. Good riddance. Burning tokens to create the looks of productiveness was fated to final solely till the accountants discovered about it, and the strictest of all accountants is one’s private checkbook. What bought many builders serious about the price of AI was the change in GitHub Copilot’s utilization costs. The price of Copilot went from a month-to-month payment with limitless use to a month-to-month payment that bought a restricted variety of credit, that are used to pay the AI supplier of your selection. One credit score is equal to US$0.01; once you’ve used up your credit, you possibly can improve your account or pay for extra credit as you go.
The query isn’t why this didn’t occur earlier; it’s why this occurred now. Tokenmaxxing is each the creation and sufferer of two large-scale traits in AI. First, beginning with OpenAI, the main AI suppliers had been all enjoying a blitzscaling recreation that prioritized consumer progress over profitability. Giving AI providers away without spending a dime bought you extra customers, and in the long term, scalers would work out make cash from end-user charges, promoting consumer knowledge, or promoting. This course of inevitably ends in enshittification, and continues to be very a lot the street we’re on.
Second, token utilization exploded late in 2025. The looks of “reasoning fashions,” which use tokens to take care of an inside dialog in the midst of fixing an issue, elevated the variety of tokens used to reply to every immediate. Reasoning tokens are a mannequin’s dialog with itself about potential responses to the immediate, and are sometimes extra quite a few than the immediate and response themselves. Whether or not or not customers see the reasoning course of (usually they don’t), reasoning tokens add to the invoice. They’re steadily counted as “output tokens” as a result of they’re generated by the mannequin, and are costlier than enter tokens.
The looks of brokers additionally multiplied the speed at which customers consumed tokens. In Might, 2025, Simon Willison quoted Anthropic’s Hannah Moran’s definition of an agent: “Brokers are fashions utilizing instruments in a loop.” The Tredence weblog writes: “The agent loop is a repeating cycle by which the AI reads the present knowledge, thinks by means of what it means, chooses an motion, carries it out, checks what occurs and begins over.” In the event you’ve ever watched Claude Code, OpenClaw, or every other agent work, a single request can grow to be many calls to a mannequin, every one utilizing lots of of tokens, if not hundreds. Along with the present request, one agent-generated invocation can include the duty’s whole accrued context and related paperwork. Between reasoning tokens and brokers, token utilization goes up by an element of lots of.
The rise in token utilization won’t be a problem if it leads to issues being solved and duties accomplished extra successfully. However it collides with the loss-leader pricing of the blitzscalers; their willingness to function at a loss to achieve management of a market has limits. No matter whether or not the variety of AI customers is rising, the quantity of computation, and due to this fact value, per consumer grows as using brokers will increase. Reasoning fashions elevated token utilization; brokers compounded the issue; and that led to cost will increase.1 Microsoft/GitHub doesn’t need to pay Copilot prospects’ AI payments. We haven’t but seen across-the-board value will increase from the AI suppliers themselves. However we’ve got seen GitHub’s token credit, and we’ve got seen Anthropic and OpenAI value extra succesful fashions considerably greater than older or much less succesful fashions. Fable is twice as costly as Opus 4.8, and whereas some writers have referred to as this pricing “incredible,” that’s most likely as a result of they had been anticipating a fair better improve. Whereas Fable can delegate duties to Anthropic’s inexpensive fashions, most early customers observe that with Fable, token use goes up reasonably than down. Anthropic’s swap to token-based billing for its agent SDK (at present on maintain) is one other sign that the times of cheap AI are coming to an finish. OpenAI’s story is comparable: GPT 5.5 prices twice as a lot GPT 5.4 per million tokens.
It’s additionally vital to take capability into consideration. Enormous knowledge facilities have been within the information, however these knowledge facilities haven’t been constructed but. Extra vital, {the electrical} infrastructure wanted to assist these knowledge facilities—transmission traces, mills—hasn’t been constructed both, and that’s not an funding over which AI firms have a lot management. They’ll construct their very own energy technology services on an information middle campus, however that’s an enormous funding in applied sciences that they’re not aware of. And even for those who generate energy regionally, you want different kinds of infrastructure: rail for coal, pipelines for gasoline. This isn’t (but) an essay about knowledge middle energy consumption and its penalties, however it’s one other issue that limits elevated token utilization. We’ve seen Anthropic’s outages blamed on capability, and Anthropic has responded by leasing unused knowledge middle capability from SpaceX. However the different means to reply to elevated demand that may’t be met by present capability is to extend costs, limiting prospects to those that can afford to pay. That improve is being observed by managers, accountants, and impartial builders.
Token optimization and accountability are the inevitable consequence of upward stress on token value. One method to construct accountability is thru higher governance, a route Bennie Haelen describes in “The Subsidy Ended: What Instrument-Utilizing Brokers Truly Price.” Higher governance is achieved by means of constructing an observability layer that permits you to see precisely what the brokers and fashions are doing. With a well-designed observability layer, you possibly can see whether or not the information despatched to the mannequin is rising with every invocation, whether or not the mannequin is utilizing applicable instruments, whether or not instruments are being referred to as repeatedly, and loads of different data that may inform you whether or not your agent is working effectively.
One other piece of token accountability is knowing which fashions are working your agent’s requests. Normal-purpose reasoning fashions vary from costly high-performance fashions like Claude Fable or Opus 4.8 to fashions like Gemma 4 26B that may run on a well-equipped laptop computer, and a few fashions which are even smaller. Whereas it’s tempting to say “I would like one of the best; I’ll run Opus 4.8 or Fable with most reasoning,” most requests don’t require that stage of reasoning or expense. Brokers will have the ability to determine what mannequin is finest for processing each request. Fable can delegate, and we anticipate different frontier suppliers to comply with as fashions incorporate agent capabilities. And there’s an lively world of open fashions exterior of the frontier AI suppliers. Vicki Boykis writes that fashions working regionally now work nearly in addition to frontier fashions. Instruments like OpenRouter offer you a model-independent means of routing requests to completely different fashions, together with open fashions that run regionally. OpenRouter may be built-in with OpenClaw, Claude Code, Cursor, Codex, and different brokers to supply clever routing.
Tokenmaxxing is dying. It’s going to little doubt take time for its vestiges to die away, and there’ll at all times be builders who assume they will recreation the trail to a promotion, together with managers who insist on being “all in” with AI. However spending tokens responsibly is now the norm, whether or not you pay with your personal checkbook or an organization account. Token optimization will solely grow to be extra vital as per-token costs improve. They undoubtedly will.
