As organizations continue to accelerate their artificial intelligence capabilities, data center strategies are evolving rapidly to keep up with the fluctuating computing demands. Deloitte research (detailed in our previous article, “Is your organization’s infrastructure ready for the new hybrid cloud?”) shows a range of infrastructure approaches organizations are exploring based on factors like cost, latency, and hardware needs. The possibilities range from a rush off of the mainframe, to informing new hybrid cloud strategies, to establishing a new market for AI factories. These factories could handle tasks ranging from standard inferencing to high-performance computing at a tremendous scale.1
But for many leaders, building a resilient business and tech infrastructure for the age of AI could pose difficult questions. AI systems depend on large volumes of high-quality data and complex distributed architectures with many interdependent components. Ensuring data availability, integrity, real-time fault detection, robust runtime security, and rapid recovery is important for accurate predictions and effective fault tolerance, especially given the expanded attack surface and the need for low-latency responses.
What’s the right balance among on-premises, cloud, and other high-performance computing (HPC) solutions? How adaptable will the tech infrastructure need to be as AI workloads scale to account for synchronous or asynchronous coordination across orchestration layers? AI consumption and utilization patterns are still largely uncharted territory, potentially making it difficult for enterprises (and operators) to forecast needs and plan with confidence.
To better understand how computing workloads are expected to shift across infrastructure types such as mainframe, cloud, enterprise on premises, edge computing, and emerging technologies in the next 12 months, Deloitte surveyed 120 market operators—including data center providers, energy providers, and distributors—between March and April of 2025 (see methodology). The analysis explores projected changes in computing demand driven by AI workloads, the key factors influencing leaders’ decisions to shift workloads across different infrastructure types, and the steps leaders are taking to address the challenges of scaling their computing infrastructure to meet evolving needs for HPC.
Our survey asked leaders how their computing workload is expected to change in the next 12 months and beyond across tech environments (figure 1).
The data shows a clear trend: Respondents expect AI-driven workloads that demand a variety of processing capabilities to increase computing demand across a wide range of platforms. Every tech environment asked about in the survey is expected to see workloads increase by 20% or more in the next 12 months. AI workloads might include pretraining models, improving them through reinforcement learning or other post-training techniques such as chain of thought or reasoning, and then using them for inferencing tasks as AI is deployed at scale, especially agentic AI.
Respondents expect the biggest short-term spikes to come from emerging AI cloud providers (87%) and edge computing platforms (78%), outpacing on-prem data center growth by roughly 10 to 1 and 6 to 1, respectively. Public cloud and private cloud also both show notable anticipated increases. Although mainframe and on-prem data center workloads are also expected to increase in the next year, respondents appear to be reducing their reliance on them as they add capacity elsewhere, or reconfigure existing non-AI on-prem solutions to AI-optimized configurations. Almost a third of respondents say they plan to decrease or significantly decrease mainframe and on-prem workloads in the next 12 months.
While this may correspond with trends Deloitte has discussed elsewhere related to decommissioning mainframes, in the case of traditional data centers, our broader research has shown that organizations are taking multiple approaches to addressing the increased need for computing: reconfiguring existing data centers; reactivating decommissioned data centers; reimagining AI infrastructure with hyperscalers, niche providers, and new entrants; and reviewing GPU (graphic processing unit) and AI token utilization to trigger investments in new solutions.
What might these shifting computing demands mean for business leaders? There may be a need to empower infrastructure leaders to build smarter, more resilient, and efficient environments, while also requiring new approaches to security, governance, and talent management to better navigate disruptions—which many companies may be feeling already. Scaling AI is expected to increase workloads, and some of the potential challenges may be corralling that growth on the right platforms, ensuring that workloads don’t scale where they shouldn’t (like the mainframe), and phasing out legacy tech systems.
Diverse consumption models introduce complexities, which are often addressed by:2
A few of these solutions are driving an engineering-led approach to rethinking control, tooling, and management plans to support the varied consumption patterns. While not specific to a single individual client, Deloitte’s client experience through its hybrid cloud infrastructure offering points to several hybrid models that combine cloud, edge, and on-prem solutions to help create a resilient AI tech backbone.
At the same time, as AI demand grows and scales, several factors appear to be feeding organizational decisions to shift workloads from the cloud, but cost is the top motivator identified in our survey. The majority of respondents (55%) say they plan to incrementally move workloads from the cloud once their data-hosting and computing costs hit a certain threshold. Another 17% cite latency and/or security needs as their main reason for moving off the cloud (figure 2).
However, nearly one-third of respondents (27%) plan to stay on the cloud, even if it costs more. Some organizations may be skeptical that AI workloads will scale enough in the long term to justify a move away from the cloud, but they may be missing the big picture on cost overall. Beyond data storage and computing cost (typically measured in GPUs and central processing units [CPUs]), organizations should also account for the cost of model usage and the increased cost that comes from inferencing,4 typically measured using AI tokens.
As organizations are learning more and models are evolving, some leaders may have seen that understanding cost dynamics could benefit from a thoughtful approach to financial operations across the entire data and model life cycle. This could go beyond cloud cost, including the cost of data hosting, networking, inferencing, and latency for HPC across hybrid infrastructures. As solutions become more advanced, leaders should consider being nimble in making decisions that give them flexibility in a fast-moving environment where new options are emerging for data hosting, computing, networking, and inferencing.
Across the life cycle of building, operating, and running AI infrastructure, technology resiliency can involve:
Technology resilience may depend on systems being able to operate amid both expected shifts and sudden disruptions. It should be about more than business continuity, also involving proactive planning for infrastructure changes.
AI magnifies that challenge for some. These systems don’t just run on tech—they’re embedded in complex digital ecosystems. As their role in the organization grows, so does the need for infrastructure that can keep pace.
Leaders can potentially strengthen hybrid cloud machine learning agility by taking some deliberate steps to prepare. Consider the following.
Some companies are building on-prem GPU farms for both training and inference needs. Others are using models that are application programming interface–enabled and require no on-prem infrastructure. The answer could be to use a combination—for instance, an open-source or open-weight large language model on-prem that could be fine-tuned on a private cluster.
Rather than locking into one setup, infrastructure should allow models to be hosted and workloads to move based on context—across systems, users, and even multi-agent operations—while maintaining the right guardrails and access controls. Consider investing in dedicated hardware, such as large clusters, as needs are understood. A pragmatic, incremental approach, leveraging both cloud and alternatives, could better deliver the agility, cost-efficiency, and control grounded in real-world demands required for high-performance computing that AI workloads typically require.
While there are many economic considerations that should go into calculating the total cost of ownership across AI infrastructure (e.g., CPUs, GPUs, networking, AI token costs, etc.), for organizations concerned about high cloud computing costs, our survey asked them at what point they’d consider moving workloads off of the cloud and buying their own GPU rack (figure 3).
The largest share of respondents (30%) say they won’t consider moving workloads off the cloud until cloud costs reach 1.5 times what they’d pay for an alternative—waiting until a GPU rack can save them 50% right out of the gate. In other words, they appear to want a guaranteed day-one return on investment before making a switch. This is potentially concerning given that this measure is only one of several inputs to the total cost of ownership across a mature AI tech stack.6
Cloud solutions offer on-demand access through subscription models, possibly reducing upfront risks and enabling organizations to quickly scale to meet fluctuating demand. This can include access to the latest hardware, such as specialized GPUs for AI, machine learning, high-performance computing, and visualization tasks. Cloud solutions can also help simplify experimentation by making it easy to spin up GPU clusters for development, testing, and training.
But at a certain stage, this approach could come with a cost. Risk-averse organizations may use the cloud for pilot inferencing to avoid capital expenditure obligations and maintain a low operational expenditure experimentation model. These organizations, however, may end up carrying the cost of a high cloud bill longer than necessary and miss the opportunity to invest in high-performance infrastructure that could drive differentiation and competitiveness with AI business capabilities.
Our survey suggests that some organizations may be moving ahead early based on GPU indicators, while others may be better equipped to account for the total cost of ownership. Twenty-four percent of respondents say they plan to move off the cloud when the cost reaches 25% to 50% of the relative cost of alternatives.
Given that cost is the top driver for most organizations surveyed, leaders can consider a few important actions.
Organizations may need to approach AI infrastructure differently based on their scale and needs. At one end of the spectrum, API-only approaches allow access to high-performance computing infrastructures without direct investment, though leaders should still account for model usage costs. A more moderate approach can involve investing in hybrid AI infrastructures that bring together a combination of traditional on-prem, public/private cloud, edge, neo clouds, or dedicated AI clusters that the organization owns and operates. These configurations may vary based on current infrastructure and future requirements.
Some large-scale organizations are building their own AI factories—data centers optimized for processing AI workloads. While these require a larger up-front investment, purpose-built hardware and models lower lifetime usage costs and may even open up new revenue streams.7 For example, Deloitte’s Center for Technology, Media, and Telecommunications reported that 15 global telecoms in over a dozen countries brought new AI factories online in 2024, with more following suit. Whether an organization is experimenting with AI or is operating at scale, infrastructure decisions should align with workload and performance needs.
As enterprises navigate the evolving demands across their tech estate, data center operators acknowledge the challenges. According to our survey, respondents’ two biggest concerns as compute demand grows are power and grid capacity constraints (70%) and cyber or physical security (63%).8
As the computing paradigm evolves, 78% of respondents suggest that technological innovation could also be the answer.
Several possible factors can reshape how organizations consume GPUs to help manage these supply and demand dynamics.
While approaches like managing CPU and GPU usage and using smaller models can be beneficial, for some organizations, they may have limited impact. Eventually, organizations may have to consider whether a cloud-centric approach gives them the resilience they need or whether they need to invest in managing their data, network, and infrastructure with new alternatives based on their inferencing needs as well as their computing needs.
As enterprises scale AI workloads, finding the right balance between innovation and risk management could be essential. Long-term success could depend on deep workload visibility, high-quality data, and strong security protocols. Leaders who prioritize these capabilities may be better positioned to avoid costly missteps and unlock the full potential of AI.
The Deloitte Research Center for Energy & Industrials conducted a survey in April 2025 to identify US data center and power company challenges, opportunities, and strategies, and to benchmark their infrastructure development. The survey’s 120 respondents include 60 data center executives and 60 power company executives and included questions on infrastructure buildout challenges, resource mix to meet future energy consumption, workforce issues, AI workload planning, drivers of load growth, and investment priorities. The Center for Integrated Research focused on the 60 data center respondents and analyzed how these leaders expected computing demand to increase or decrease in the next 12 months based on AI workloads, what might cause them to move workloads from the cloud, and specific cost triggers for that move based on a specific cost per GPU inflection point.