Infrastructure That Earns Its Keep: Running Agents Without the 3x Cost Curve

Every technology leader is staring at the same arithmetic. AI workloads are expanding, and with them the cost of the infrastructure underneath. McKinsey projects that IT infrastructure costs will rise two to three times by 2030 while budgets stay flat (McKinsey, 2026). At the same time the demand for speed keeps climbing, and outages carry more financial and reputational weight than they ever have. The old way of running infrastructure, ticket by ticket and hand to hand, cannot absorb any of it.

Agentic AI is the reason the pressure is rising, and it is also the release valve. The same technology that is driving compute and storage demand can automate the routine work of running infrastructure. McKinsey finds that agentic AI can take on 60 to 80 percent of routine infrastructure work over time, cutting run-rate cost 20 to 40 percent in initial deployments (McKinsey, 2026). The catch is that the value only shows up if infrastructure itself is rebuilt to be run by agents, not just by people. And most organisations are not there. While 62 percent are experimenting with agents, no more than 10 percent in any given function have scaled them (McKinsey, 2026).

That gap is the opportunity. This piece is about closing it: turning infrastructure from a cost centre that quietly grows into the platform that orchestrates how work gets done, and earns its keep while doing it. Five moves, each with an interactive exhibit you can pull apart, and each with the place RealAI can help.

Key takeaways

The cost curve is coming, and agents bend it. Infra cost is set to run 2 to 3 times higher by 2030 on flat budgets (McKinsey, 2026). Agentic automation cuts run-rate 20 to 40 percent, the difference between riding the curve and bending it.
Value concentrates in a few pools. Service desk, observability and ITSM, network, hosting, and cost management are where the savings sit. The service desk is the quickest to value.
Infrastructure becomes a mesh. Silos cannot coordinate agents at scale. Domains connect through a shared orchestration layer, governed by composability, decoupling, vendor flexibility and governed autonomy.
Automate the routine, keep the judgment. Agents can take 60 to 80 percent of routine work; the top tier of judgment stays human. The point is to move people up, not out.
Sequence the first 90 days. Redesign a process, strengthen operational data, establish governance, stand up an agent registry. A quick win lands early; the compounding value comes once governance is in place.

Force one: the cost curve is coming, and agents bend it

Start with the number that reframes everything else. Infrastructure cost is on a path to run two to three times higher by 2030, and budgets are not going to grow to match (McKinsey, 2026). More than one third of high performers are already committing over 20 percent of their digital budgets to AI, which pulls even harder on the same envelope. A technology leader who does nothing but provision more of the same is signing up to blow through the budget every year, with the gap widening as workloads grow.

The release valve is that agentic AI can run the infrastructure that agentic AI demands. Applied to the routine work of operations, it can automate 60 to 80 percent of it over time and cut run-rate cost 20 to 40 percent in initial deployments, with more to come as adoption deepens (McKinsey, 2026). That is not a rounding error against a doubling cost base. It is the difference between a curve that runs away from the budget and one that bends back toward it.

The move for a technology leader is to treat cost as a design constraint from the first day, not a quarterly surprise. Every workload that agents can run more cheaply is budget you get back to spend on the workloads that grow the business. The organisations that win this decade will be the ones that used agentic AI to pay for its own infrastructure.

Exhibit 1Bend the curve, do not just ride it.Drag the automation level. The ghost do-nothing curve climbs to roughly 2.5x by 2030, while the automated curve pulls down toward the flat budget line, cutting run-rate cost 20 to 40 percent. Automation is what turns a runaway cost base into one you can hold.

The exhibit shows the two futures on one chart. Do nothing and the curve climbs away from the budget. Apply agentic automation and it bends back down. The steepness of that bend is a choice a technology leader makes, workload by workload.

Force two: the value concentrates in a few pools

The instinct when the savings are large is to chase all of them at once. The evidence says the value is concentrated, so sequencing beats spreading. McKinsey identifies five areas where agentic infrastructure creates the most near-term value: the service desk, observability and IT service management, network operations, hosting operations, and active cost and contract management (McKinsey, 2026). They are not equal in size or in speed to value.

The service desk is the quickest to value, and it is large. It accounts for 20 to 30 percent of infrastructure labour spend, with high ticket volumes and predictable resolution paths that suit automation. Organisations see 25 to 45 percent savings there, along with always-on support and a better employee experience. In one case an enterprise service desk handling roughly 450,000 tickets a year automated up to 80 percent of requests, redeployed half of its agent capacity to higher-value work, and reached a customer satisfaction score of 4.8 out of 5 (McKinsey, 2026). Network operations and hosting operations each return 20 to 40 percent in initial deployments; Deutsche Telekom's RAN Guardian agent is an example at scale in the network domain. Cost and contract management, which touches 40 to 60 percent of total technology spend, returns a smaller 5 to 15 percent, but on a very large base.

The move is to rank the pools by return and speed to value, then start with the one that pays back fastest and builds the credibility to fund the rest. That is usually the service desk. Prove the model there, then move into the operations pools where the spend is largest.

Exhibit 2Sequence by the pool with the fastest return.Click a pool. The bars are ranked by run-rate savings potential, and the readout shows the spend context behind each one. The service desk is the quickest to value; the operations pools carry the largest spend.

Reading the pools this way turns a daunting programme into a sequence. You do not need to automate everything at once. You need to start where the return is fastest and let each win fund the next.

Force three: infrastructure becomes a mesh

Underneath the savings is an architectural shift a technology leader has to lead deliberately. Infrastructure built for human-led operations, with each domain running its own tools in its own silo, cannot coordinate agents across those domains. To run agents at scale it has to become more mesh-like, where domains connect through a shared orchestration layer that lets work flow across them while control and reuse are preserved (McKinsey, 2026).

That shared layer rests on four foundational capabilities. Actions have to be repeatable and executable through secure APIs, with policy checks built in. Operational data has to be reliable, with clear sources of truth for assets, dependencies and ownership. Controls and agent governance have to be embedded, so every agent has an owner and every action is logged and auditable. And the agent estate needs lifecycle management, interoperability and context, so agents can operate across systems safely. On top of those, a set of design principles keeps the mesh healthy: composability so components can be reused, decoupling so execution and data layers evolve independently, vendor flexibility so no single tool locks you in, and governed autonomy so agents act within clear policies.

The important point is that this is an evolution, not a rip and replace. Most organisations already run tools like ServiceNow, cloud management, network controllers and observability stacks. The move is not to replace them but to connect them into one coherent estate that enables cross-domain coordination and reuse, while keeping the freedom to build and host agents where cost and data sensitivity dictate.

Exhibit 3Silos cannot coordinate agents at scale.Click the toggle. In silos the domain tools sit disconnected and agents lose the thread across domains. In mesh they connect to a shared orchestration layer that carries the four principles: composability, decoupling, vendor flexibility and governed autonomy.

Toggle the topology and the argument lands. Disconnected tools cannot hand work between them reliably. A shared orchestration layer is what lets agents coordinate across the whole estate, which is the precondition for scaling any of the value pools.

Force four: automate the routine, keep the judgment

The fear that follows any automation number is that it comes for the people. The shape of the work says otherwise. Agents are strong at the routine and repetitive tasks that dominate infrastructure operations, and weaker at the judgment calls that sit above them. The 60 to 80 percent that agents can automate is the routine tier: password resets, ticket triage, alert correlation, predefined remediation, capacity rightsizing (McKinsey, 2026). The tier above, complex root-cause analysis and architecture decisions, stays human, and the people who did the routine work move up to it.

This is why the run-rate savings and the workforce story are the same story. As agents take the routine tier, engineers are freed from responding to alerts and running manual fixes, and redeployed to the higher-value work of proactively managing risk and designing the estate. In the service desk example, half of the agent capacity was redeployed rather than removed (McKinsey, 2026). The organisations that handle this well treat the shift as a move up the ladder, and they invest in it, because an engineer who used to triage tickets does not automatically know how to supervise a fleet of agents.

The move for a technology leader is to draw the automation line deliberately and to reskill the people crossing it. Automate to the ceiling of the routine tier, keep judgment human, and build the capability that lets your team operate at the higher altitude the automation creates.

Exhibit 4Automate the routine, keep the judgment.Drag the automation frontier up the ladder of tasks, from routine at the base to judgment at the top. The rungs below turn automated and the run-rate cut rises to the 20 to 40 percent band, while a judgment cap stays human at the top.

The ladder makes the boundary concrete. There is a ceiling to how far automation goes, and it is not the whole job. The value is in automating everything below the judgment tier and moving your people into the tier above it.

Force five: sequence the first ninety days

Agent-ready infrastructure is not a quick fix, and the first 90 days decide whether it builds momentum or stalls. The sequence that works is the same one McKinsey lays out for the first quarter: redesign a targeted process, strengthen the operational data underneath it, establish the governance that lets agents act safely, and stand up explicit management of the agents themselves (McKinsey, 2026).

Order matters because value accrues unevenly. Redesigning one high-volume process, the service desk being the obvious candidate, lands a visible win in the first weeks and buys the credibility to keep going. Strengthening operational data, the clear sources of truth for assets and dependencies, is what lets agents act reliably rather than on guesswork. Governance, a clear framework of permissible actions, escalation thresholds and a named owner for every agent, is what lets you move from pilot to production without losing control. And an agent registry that tracks each agent's purpose, scope, performance and cost is what keeps the estate from fragmenting as it grows. The quick win comes early; the compounding value comes once governance and the registry are in place.

The move is to run these four as a deliberate sequence, not four parallel scrambles. Prove one process, make its data trustworthy, govern it, and register the agents that run it. Then repeat into the next value pool with a foundation that already holds.

Exhibit 5A quick win early, compounding value later.Drag the week across the first 90 days. The four workstreams light up as they begin and the readiness line rises. A visible win lands in the first weeks; the compounding value arrives once governance and the agent registry are in place.

The runway shows why sequencing beats a big-bang programme. Value is not linear. A quick win early keeps the programme funded, and the real compounding starts once the governance and registry foundations are laid.

Where to start

The five forces are one argument seen from five angles: infrastructure is becoming the platform that runs the work, and agentic AI is how it pays for itself. The sequence to get there is the one that runs through this whole series.

Assess first. Rank your infrastructure value pools by return and speed to value, and pick the process worth proving first. The service desk is usually the fastest payback and the cleanest place to learn.

Transform the estate into a mesh. Connect the tools you already run through a shared orchestration layer, evolving each layer rather than replacing it, and give every agent a walled-off, audited runtime so autonomy stays governed as it grows.

Sustain what you build. Put the agent registry, the observability and the cost governance in place from day one, because that recurring discipline is what keeps infrastructure earning its keep instead of drifting back into a cost centre. Reskill the team into the higher-altitude work the automation creates.

For the first time in decades, a technology leader can redefine how infrastructure is built and how work is executed on top of it. The leaders who treat agentic AI as an incremental automation layer will see localised gains. The ones who rebuild infrastructure to be run by agents, and to pay for itself, will change their organisation's speed, resilience and economics at the same time.

2-3x
Infra cost rise by 2030 on flat budgets (McKinsey, 2026): 60-80%
Of routine infra work agents can automate (McKinsey, 2026): 20-40%
Run-rate cut in initial deployments (McKinsey, 2026): 4-6 wks
To a value-first infrastructure roadmap (RealAI)

Infrastructure stops being the cost of doing the work and becomes the platform that runs it. Built right, it earns its keep, and then it earns more.

This is the third in a three-part RealAI series on the foundations of scaled AI, written for the leaders who own the estate. First was the data-readiness dividend, then the foundations before agents. Together they trace one path: a governed data estate, an agentic foundation, and the infrastructure that runs it. The series responds to McKinsey's 2026 technology research on AI data readiness, agentic foundations, and infrastructure.

“For the first time in decades, infrastructure stops being the thing that supports the work and becomes the platform that runs it. The leaders who see that will change their organisation's speed, resilience and economics at once.”

Get in touch

Put RealAI’s applied-AI team on your hardest data problem.

We help enterprises move from pilots to production: sovereign models, governed data, and agents you can audit. Start with a value-first assessment.

Talk to RealAI All insights

Infrastructure That Earns Its Keep: Running Agents Without the 3x Cost Curve

Force one: the cost curve is coming, and agents bend it

Force two: the value concentrates in a few pools

Force three: infrastructure becomes a mesh

Force four: automate the routine, keep the judgment

Force five: sequence the first ninety days

Where to start

More from the field

The Data-Readiness Dividend: Why 93% Never Reach Scale, and the Estate That Earns Its Keep

Foundations Before Agents: Why 8 in 10 Stall on Data, Not Models

The CHRO Agenda 2026: The Workforce Is the AI Strategy

Ready to make AI real?