Nicholas Merizzi

United States

Tim Smith

United States

Nitin Mittal

United States

Key takeaways

  • Traditional total cost of ownership (TCO) models may need a refresh: The emergence of AI tokens as a primary unit of value suggests something more is needed to manage AI expenditures than legacy TCO frameworks.
  • Hybrid consumption models dominate: Enterprises will likely leverage a combination of SaaS, APIs, and self-hosted infrastructure to power AI agents; each approach has distinct cost dynamics to understand and manage.
  • FinOps discipline by default: Real-time monitoring, forecasting, and spend management are key to control costs and maximize ROI. 
  • Leadership alignment at the core: Strategic, technical, and financial leadership should be unified to drive sustainable AI adoption and value realization.

AI is now the fastest-growing expense in corporate technology budgets, with some firms reporting that it consumes up to half of their IT spend. Cloud computing bills are rising sharply—up 19% in 20251 for many enterprises—as generative AI becomes central to operations. Yet, as costs mount, returns can remain elusive. According to Deloitte’s 2025 US Tech Value survey, nearly half of leaders expect it will take up to three years to see ROI from basic AI automation, and only 28% of global finance leaders report clear, measurable value from their AI investments.

This disconnect is not just a financial headache—it’s a strategic reckoning. For many organizations, the imperative to adopt AI is less about immediate returns and more about staving off existential threats or maintaining competitive parity. In these cases, the focus must shift from whether AI delivers value to how its economics are measured and managed for organizations to thrive in a structurally different environment. As such, enterprise technology, business leaders face a new economic reality, defined not necessarily by traditional metrics but rather the volatile, nonlinear dynamics of token-based AI consumption

Tokens: The true currency of AI

Unlike previous technology waves where costs were tied to subscriptions or virtual machines, AI economics now revolve around tokens—the fundamental unit of AI work. Every interaction, from model training to inference, is measured in tokens, or small chunks of data that models process, making costs inherently variable and often unpredictable. Key drivers of this volatility include:

  • Nonlinear demand: Complex reasoning models can improve performance but consume more tokens than models running simple reasoning tasks.
  • Fluctuating usage: Token use can fluctuate with experimentation, workload design, and prompt engineering.
  • Variable pricing: The cost per million tokens changes with model capabilities and infrastructure efficiency.

Token costs, meanwhile, are shaped by a cascade of technical decisions:

  • Compute: Modern GPUs and high-bandwidth memory can shorten “time per token” but come at a price premium. 
  • Storage: High-speed storage is important; legacy systems can add latency to GPU processing and inflate per-token costs.
  • Networking: Ultra-low-latency interconnects can cut idle cycles and lower costs, while traditional connectivity can drive them higher.
  • Power and facilities: Next-generation GPU racks can draw immense power and require specialized infrastructure, costs that are embedded in every token consumed.

Leaders who understand these dynamics can more effectively manage AI as a true economic system, aligning infrastructure and model choices with business priorities, optimizing spend while delivering high-quality outcomes.

The paradox: Falling prices, rising consumption

While the unit price of AI tokens is falling, overall enterprise spending on and scaling of AI systems is rising. The number of users, complexity of models, and intensity of workloads will likely drive greater token consumption and, consequently, higher costs. 

As AI workloads scale, the underlying mechanics of a new AI economy emerge, with spending likely falling into different buying patterns depending on how organizations consume intelligence:

  • Generating through packaged software abstract tokens almost entirely. Leaders can see a predictable subscription or per-seat fee, but little transparency into token consumption efficiency. 
  • Consuming through APIs makes tokens explicit. Every query is metered, billed and exposed. This can bring transparency but also volatility: costs rise based on workload design, prompt length, and hidden choices of infrastructure providers.
  • Running on owned infrastructure. Bringing AI in-house means token economics are fully internalized. Tokens can become the outcome of decisions about GPUs, storage tiers, networking and energy contracts. The emerging shorthand for this strategy: the AI factory.

A Deloitte simulation set up to isolate how hosting choices, AI model selection, and usage maturity interact to drive token consumption and total cost based on 8 GPU scaled increments found: 

  • An on-premise AI factory can be the most cost-effective option once token production reaches a threshold.
  • Over three years, an AI factory can deliver more than 50% cost savings compared to both API-based and neocloud solutions. 
  • Approximately 50% of the AI factory cost can be attributed to factors other than GPUs when factoring in networking, power and cooling, facilities, and the software stack. 

To prevent runaway costs, leaders should strive to optimize what they use. The following approaches can be helpful:

  • Right-size models: Deploy smaller, fine-tuned models for domain-specific tasks to minimize unnecessary token consumption. Embrace open-source models where appropriate.
  • Streamline design: Limit context windows and employ algorithmic techniques such as early stopping and prompt truncation.
  • Embed governance: Implement real-time monitoring, budget alerts, and chargebacks to business units as essential guardrails.
  • Adopt FinOps practices: Forecast token demand, enforce ROI thresholds, and approve only those projects that meet defined economic criteria.

The leadership imperative: Govern AI as an economic system

AI cannot be managed with outdated cost models. Business leaders should treat AI economics with the same rigor as energy or capital allocation, recognizing tokens as the new currency. Hybrid infrastructure and FinOps are key to sustainable AI adoption, enabling organizations to deploy workloads where they can be most economically and strategically advantageous. Fluency in token economics will increasingly distinguish organizations that can scale AI confidently and convert consumption into measurable enterprise value.

Read the full report: The pivot to tokenomics: Navigating AI’s new spend dynamics.

This article originally appeared in Deloitte Executive Perspectives in CIO Journal from The Wall Street Journal on Jan. 14, 2026. The Wall Street Journal News Department was not involved in the creation of this content.

BY

Nicholas Merizzi

United States

Tim Smith

United States

Nitin Mittal

United States

Diana Kearns-Manolatos

United States

Endnotes

  1. George Fitzmaurice, "Cloud spending projected to grow 19% this year on back of strong 2024," IT Pro, February 21, 2025, accessed November 18, 2025.

Acknowledgments

Cover image by: AdobeStock

Copyright

related content