The New Agentic AI Infrastructure

The promise of the technology is forcing new approaches to the framework that will support it

By Andrew Wig

Image by IndrePau / stock.adobe.com

As AI gets better at serving us, it’s also becoming more demanding. 

 

It’s gone from the predictive work of analyzing and classifying data, to the generative work of creating text and images, to the point now where it has agency, capable of using tools to complete tasks on its own. AI is no longer just about finding connections in data or spitting out reports; agents are about action and complex coordination, driving workflows involving dozens or hundreds of steps across multiple models, tools and systems. 

 

This is requiring enterprises to rethink the way they get things done, and requiring technology providers to rethink the support they provide. 
“When we start doing the enterprise-level agentic AI, then we need completely different infrastructure,” 
says Saurabh Shrivastava, global head of specialist solutions architecture at AWS.

And all that infrastructure requires an “immense amount of innovation” in networking, storage and hardware, observes Rohit Badlaney, general manager of IBM Cloud, Product, Design and Industry Platforms. The unique requirements of agentic AI will force 80% of enterprises to modernize their “legacy cloud environments” to new platforms for AI workloads by 2027, market research firm IDC predicts.

Devops graphicDevops graphic

MCP: The Straw That Stirs the Drink

The overarching component enabling agentic AI is the orchestration layer, where agents can be built, managed and deployed. As agents work under the orchestration layer, they rely on context and memory to know what to do. 

 

Supporting agents in this capacity is the model context protocol (MCP) server, which standardizes the way agents interact with external tools, databases, APIs and other agents. The MCP server also helps support the vectorized knowledge bases that provide persistent memory for decision-making and multi-step reasoning. 

 

“You create the agents. You bring the agents together. You make those agents intelligent by creating the vectorized knowledge base and putting them onto the MCP server so they can talk to each other,” Shrivastava explains. 

Different Models, Different Resources

The compute resources powering agentic AI vary depending on the type of model. 

 

On one hand, producing the tokens that drive generative AI processing requires a lot of video memory and powerful GPU cards, notes Paulo Pereira, worldwide director for mainframe modernization at AWS. But the needs change when AI becomes agentic. 

 

“When you move into agentic, these tokens are driving actions for the agents, and the agents need to be able to perform those actions,” Pereira says. 
“So they need more CPU, they need more memory, and they need access to the systems that will have the tools that they'll be accessing.” 

Though Pereira notes the delta in overall resource consumption between purely generative AI and agentic AI is “not that significant,” agentic AI requires a dedicated runtime environment—"actual computers for these agents to run those tools and perform the actions that will be driven by the token generated by the generative AI." That's in contrast to purely generative AI, which uses a thin client supported by massive compute on the backend, he explains. 

 

 

Innovating With Open Source

Looking beyond hardware, both IBM and AWS are acknowledging the importance of open-source technologies for agentic AI. “There's a tremendous amount of innovation happening in open source,” Badlaney says. 

 

And key to IBM’s AI strategy, he notes, is the integration of Red Hat OpenShift. The open-source Kubernetes platform, which is owned by IBM, can also be deployed through AWS. 

 

“We’re very much aligning on the open ecosystem with Red Hat and with their Red Hat AI strategy. So think MCP server, think optimization of those software stacks that sit on top of the AI accelerator,” 
says Badlaney.

The Data Center: You’ve Got Options

The cloud is not a monolith. It’s home to all kinds of hardware to accommodate various workloads. In the IBM Cloud, for instance, that includes the full breadth of Nvidia silicon, but also AMD and Intel processors. 

 

“We find certain AI accelerators better suited for inferencing and certain (accelerators) better suited for training. And so we've done extensive work on full-stack optimization depending on the kind of GenAI workload you're running,” Badlaney says. “ … “They all have unique characteristics when it comes to the throughput of inferencing and the total cost of ownership to the end clients.”

 

IBM’s approach to the cloud and AI is hybrid, where workloads requiring high transactional throughput and real-time inference are left on-premises, while GenAI workloads go to the cloud, he adds. 

 

“Agentic and GenAI adds a tremendous amount of pressure on the optimization and the efficiency of the underlying cloud platforms,” he Badlaney says. “So we're constantly innovating.” 
Share this article