AI's Production Era Is An Infrastructure Story

AI's Production Era Is An Infrastructure Story

For most of the generative AI boom, HPE has been easy to overlook. The headlines belonged to the model labs, and to NVIDIA, whose chips powered them. Traditional infrastructure vendors like HPE, and the compute, storage, and networking they offer, sat in the background as assumed plumbing.

That’s changing. AI is maturing out of the lab into persistent, high-volume production. Token demand is exploding on the back of longer context windows, reasoning models, and agents that loop through many steps per request, and service providers and neoclouds are racing to build capacity to meet it. As the bills climb and frontier-lab subsidies thin, the largest buyers (giant enterprises, sovereigns, service providers) are starting to weigh owning AI infrastructure against renting it.

NVIDIA’s Jensen Huang has a name for what they’re building: the AI factory. Energy and data in, tokens and intelligence out. Inside that factory, boring, old-fashioned infrastructure does the work: compute, networking, storage, power, cooling, operated at staggering scale.

Suddenly the center of gravity in AI is drifting toward exactly the terrain HPE has worked for decades, and on economic terms as much as technical ones. Whether you own the factory or rent its output, the cost of AI increasingly tracks the cost of running physical plant: the power it draws, how fully it’s utilized, how efficiently it moves data. The price of a token bottoms out in the economics of infrastructure.

So when HPE CTO Fidelma Russo took the Discover 2026 keynote stage and said “AI economics are starting to look like infrastructure economics,” it captured, for me, the phase we’re entering.

That’s the bet HPE made at Discover 2026: the systems that turn models, data, and infrastructure into reliable production capacity will shape the next phase of enterprise AI. It’s also the same shift down the stack I flagged at Google Cloud Next this year, away from raw capability and toward the unglamorous work of running this stuff in production. Two very different companies, converging on the same read of where AI’s hard problems sit today. HPE’s version of that bet holds in some places, remains an unproven claim in others, and could unravel in a few.

From Tokenomics To Infrastructure Strategy

The fastest way to understand the shift is to follow the money. “Tokenomics” has crossed from AI jargon into budget planning, and the reason is agents.

Agentic systems create a cost-forecasting problem that a single model call never did. Agents plan, retrieve context, call tools, reason over results, validate outputs, retry, and coordinate with other systems. Reasoning models push token use higher still. One user request can trigger a long chain of model interactions, and that chain varies from run to run. Multiply that by agents that live inside business processes instead of waiting to be summoned, and token consumption becomes a workload IT has to operate.

That was Russo’s point in her CTO keynote, and it’s why her framing is timely. Once agents have continuous access to enterprise data, IT teams end up running inference as a continuous operational load, with all the familiar infrastructure concerns attached: cost, utilization, efficiency, and scale.

When I sat down with Thierry Piennar, HPE’s CTO of AI and HPC, for a Briefing Room conversation at the event, he put the same idea in engineering terms. “You have to look at inference from a cost optimization perspective, power to token,” he told me, then walked through the levers: memory optimizations like KV cache and CXL, intelligent routing like NVIDIA’s Dynamo, and matching each workload to the right mix of GPU, CPU, and memory strategy. The goal, in his words, is “the ability to optimize from the request all the way to the token.” That’s infrastructure engineering aimed squarely at the cost of a token.

Fidelma brought receipts to back up her claim, to the tune of $100,000 per month saved on an internal project called Mindstone. Every day, HPE’s storage and support systems ingest billions of operational signals from customers, and as those systems grew more autonomous, token consumption climbed right along with them. So HPE’s engineers built Mindstone, an AI-first support platform, and ran it on-premises on GreenLake Intelligence and Private Cloud AI, gaining greater control over the economics, better-governed customer data, improved performance, and the reported 30x drop in cost. ‘We stopped being consumers of AI,’ Russo said, ‘and we became producers of intelligence.’

Those are HPE’s own figures, as is the headline Alletra Storage MP X10000 claim of up to 20x faster time-to-first-token and 17x higher throughput. To its credit, HPE walked through the test on stage: real financial-services workloads on NVIDIA H200 GPUs, with KV cache extended beyond GPU memory over object storage and RDMA, plus third-party validation from HPE partner Kamiwaza. It hasn’t published a full methodology, so read the multipliers as directional rather than independently proven.

HPE keynote slide claiming 17x higher throughput and 20x faster time to first token

While HPE’s example is compelling, the push-back is convincing as well. The industry litigated much of the workload-placement debate over a decade ago, and public cloud became the default answer for a broad class of enterprise workloads because it was faster to provision, easier to scale, and less operationally burdensome than building and running infrastructure directly.

But that default was never universal. Cloud has been strongest for green-field applications, bursty demand, and workloads whose scale or utilization is uncertain. Owned or dedicated infrastructure has continued to make sense where workloads are large, steady, specialized, latency-sensitive, regulated, or tightly coupled to existing data and operational systems.

AI does not reverse the cloud’s advantages. For many teams, cloud remains the fastest way to try new models, access scarce accelerators, and avoid building specialized operational capacity before demand is proven.

But AI does create more of the kind of workload that makes placement strategy matter: sustained inference, data-intensive pipelines, latency-sensitive user experiences, regulated deployments, and agentic systems that need controlled access to enterprise data and operational systems.

HPE’s argument isn’t a simple swing back to on-premises infrastructure. Piennar was explicit: “Hybrid is the reality.” The point is coexistence. Cloud remains essential, but for enterprises moving AI from pilots into production, the more important question is which workloads belong where, under what economics, and with what operational controls.

That framing is also less self-serving than it looks: HPE isn’t arguing only for the private side of the ledger. It sells AI Factory infrastructure directly to the neoclouds and service providers building out the public AI capacity for enterprises that prefer renting to owning. HPE hopes to power the workload wherever it lands.

AI Factory vs. Private Cloud AI

HPE has two top-line AI infrastructure offerings: Private Cloud AI for enterprise-scale deployments, and AI Factory for everything at service-provider scale and up.

That split creates real confusion, though. I rarely heard two people use “AI factory” the same way twice in a row.

I put the question to the two people best placed to answer it: Thierry Piennar, HPE’s CTO of AI and HPC, and Kaushik Shirhatti, NVIDIA’s VP of AI Factory, in a special edition of the TWIML Briefing Room recorded live from the show floor.

Even HPE CEO Antonio Neri has used “AI factory” more loosely than that split suggests: in keynotes he described Private Cloud AI as “a turnkey AI factory,” and a “prepackaged AI factory.” NVIDIA uses the term even more broadly: Shirhatti told me in our interview that to NVIDIA, an AI factory is everything from a DGX Spark on someone’s desk to a hyperscale datacenter.

I saw the results of this confusion at play at one of the conference sessions I attended, where an attendee asked the speaker whether AI Factory is only a rack-scale offering, or whether there’s anything smaller. Yes, there’s a smaller version. HPE just calls it Private Cloud AI instead of AI Factory.

At one point during the event I settled on my own distinction between these two: PCAI is the single-tenant offering while AI Factory has a multi-tenant focus, which I still think is directionally correct but has its own flaws.

In the end, I we may need to chalk this one up to Conway’s Law; PCAI is owned by HPE’s Hybrid Cloud business unit, while AI Factory is a product of the Servers BU.

Private Cloud AI

PCAI is ultimately HPE’s enterprise AI infrastructure offering: AI close to your data and under your governance, with controls around models, agents, identity, observability, and recovery. It ships in the T-shirt sizes you’d expect, small, medium, and large configurations sized to how far a customer’s AI practice has progressed. PCAI for the company moving from pilots to production, looking for help deploying, operating, governing, and recovering AI systems, where data, policy, cost, uptime, and auditability all carry weight.

HPE announced several new Private Cloud AI capabilities with NVIDIA at Discover: agent registration and governance, a unified model gateway, multi-node inference scaling up to 256 GPUs, MCP support in HPE Data Fabric, and Zerto-based recovery for agentic environments.

AI Factory

AI Factory is the scale offering for service providers, neoclouds, telcos, and other operators running AI infrastructure at cloud scale, where token generation is the product. Vultr was the cleanest example at Discover, selecting HPE and NVIDIA infrastructure, including NVIDIA GB300 NVL72 by HPE, Spectrum-X Ethernet, liquid cooling, and high-speed networking, for its cloud-scale AI data centers. This pattern also picks up a sovereign flavor, what HPE brands the Sovereign AI Factory, focused on governments, national providers, regulated industries, and regional ecosystems that need data residency, local control, compliance, resilience, and reduced foreign dependency.

The HPE-NVIDIA relationship featured prominently in both the PCAI and AI Factory stories. As Russo put it from the keynote stage, “HPE doesn’t build the GPUs. We help them spend more time working and less time waiting.” That’s the whole positioning in two sentences. When I asked Piennar and Shirhatti about the partnership directly, they described co-engineering work and support for joint customers, and Shirhatti made the case that it shows up in reference designs that force the whole stack to be planned together: “we validate everything, and the more cohesive the plan is, the [better] the chances of success.”

The Rise of Sovereign AI

Governments and enterprises spent most of the past year treating sovereign AI as a theoretical concern. In the past month, they started treating it as something they have to act on.

Shirhatti framed sovereign AI as an economic engine, not a compliance checkbox. A serious sovereign strategy, he argued, is “the biggest place where they can drive economic value for the country,” letting a nation “build local models, get local context, hire and attract talent, start this wave of entrepreneurship.” He called it a flywheel, and noted that the countries pulling ahead are the ones with the conviction to go all-in rather than dip a toe.

Piennar grounded it in architecture and in a specific example. Sovereign AI, he said, “at the end of the day really is about strategic autonomy,” a nation or entity keeping “as little dependency as possible” even though “you never control the full stack.” TELUS, the Canadian telco, was his case study: a company “servicing in the interests of Canada” that is building citizen services such as “taxation services or passport applications in multiple dialects,” alongside national-defense simulation and an AI economic-empowerment program that helps local startups reach a minimum viable product. It also drags in concrete security obligations, ProtectV and NIST 800 among them. His bottom line was blunt: for a modern nation, sovereign AI is “an absolute competitive necessity,” often delivered as a federation of private sector, research institutes, and vendors like HPE and NVIDIA.

Sovereign AI is a credible role for HPE because it isn’t a policy story. HPE assembles it from facilities, power, networking, data, operations, compliance, and lifecycle support.

What Happens When Agents Break Things

Agent governance gets less attention than it deserves, and the part that gets least is the part enterprises will need first.

Most vendors frame agent safety around prevention: guardrails, policies, filters, permissions, approvals. That’s necessary but insufficient. Prevention assumes you can anticipate the failure modes of a system that plans and acts on its own. You can’t. Production agents need observability, identity, registries, auditability, rollback, blast-radius control, and recovery first.

Fidelma introduced GreenLake Intelligence’s agentic mesh in her keynote to address this: a centralized registry for agent identity and policy, and a planning agent that decides which specialized agents should handle a given request and coordinates the work across them. Zerto’s recovery story for agent-driven errors is also worth a look: if an agent modifies the wrong system, deletes the wrong data, escalates a workflow, or pushes a bad change, who sees it, and who rolls it back? An infrastructure vendor with a credible answer to that question is selling something a frontier model provider can’t.

The rest of the event’s agent-governance news, though, is less new infrastructure than it is HPE agent-enabling the ITSM stack it already sells. Ops vendors are racing to bolt agents onto their existing tools, and HPE doing the same to GreenLake, Morpheus, OpsRamp, and Zerto is table stakes. Three new copilots (Compute, Morpheus Orchestration, and OpsRamp Observability) bring the agentic mesh into day-to-day operations, alongside a new ServiceNow integration for autonomous service delivery. OpsRamp added AI observability, including visibility into running agents, model consumption, and token spend. Morpheus 9 extended HPE’s hybrid cloud control plane.

HPE keynote slide on Zerto software for agentic AI

The Rest Of The Portfolio

A handful of other announcements reinforced the same thesis from different angles.

Networking got outsized keynote real estate: Antonio Neri opened with “Architecting for AI starts with your network,” and Rami Rahim’s networking keynote extended the claim in both directions, AI changes what the network must support, and AI changes how teams operate it. The Juniper acquisition gives HPE a much wider footprint to make that case: Marvis AI coming to Aruba Central, HPE Networking CX switches coming to Mist, AI-native SASE through EdgeConnect, AI data center networking with QFX switches and Apstra Data Center Director, and scale-up networking for AMD Helios systems, backed by customer names including Ohio State, Royal Bank of Canada, Sentara Health, Disney, and the Olympics.

In an analyst-only session, Neri said that if forced to prioritize, he’d put HPE’s networking portfolio above everything else in the company. From an AI-infrastructure perspective, though, it’s a supporting input to the metrics that matter (utilization, throughput, time-to-first-token, cost).

HPE Data Fabric 8.2 added agent-aware capabilities and a simplified global data catalog, plumbing for governed, discoverable, secure data across distributed environments. This is more central than it sounds. When I asked Piennar what trips organizations up when they bring AI in-house, he went straight to data: “AI is really not meaningful without data that is good data. Garbage in, garbage out.” Mapping, curating, and protecting enterprise data, he said, remains “a huge challenge,” which is exactly the gap Data Fabric is meant to close.

HPE positioned Alletra Storage MP X10000 as active AI memory through KV cache offload, RDMA, file and object support, metadata enrichment, and NVIDIA-certified validation. Treat the performance numbers as claims, but keep the architectural point that the storage and memory hierarchy shapes inference economics. And HPE NonStop with Lusis TANGO AIF supplied the strongest vertical example: fraud detection imposes hard constraints, high transaction volume, low latency, resiliency, real business consequences, and turns agentic AI into part of a mission-critical decision system rather than a general-purpose assistant demo.

Opportunities And Challenges

Inference is growing, agents are driving demand and unpredictability, enterprises care more about governance and cost, and sovereign AI has moved from policy language into procurement, real opportunity for HPE if it can capitalize. Neoclouds, telcos, governments, regulated industries, and large enterprises need infrastructure patterns more nuanced than “call an API.” HPE has assets that fit the moment: servers, storage, networking, GreenLake, Morpheus, OpsRamp, Zerto, Cray/HPC credibility, and the NVIDIA partnership. If it can make those pieces feel like one platform, it has a stronger AI story than most buyers currently give it credit for.

The obstacles are just as concrete:

  • Perception comes first. HPE has deep enterprise trust, but many AI buyers still file it under hardware, not AI innovation. In a hype-driven market that’s a liability, though it flips to an asset the moment buyers start prioritizing cost, reliability, control, and operations, which is exactly the shift HPE is speaking to.
  • Fuzzy messaging creates unnecessary friction. The Private Cloud AI vs. AI Factory line is a defensible internal distinction that isn’t landing externally. NVIDIA uses “AI factory” to span everything from a desktop DGX Spark to a hyperscale datacenter, and I watched an attendee at a separate NVIDIA session ask point-blank whether AI Factory is rack-scale only, a sign the boundary HPE draws isn’t reading the way it intends. My take from the event: HPE should adopt one AI Factory umbrella with enterprise, service-provider, and sovereign variants, and organize under NVIDIA’s broader usage rather than fight it with parallel vocabulary.
  • Differentiation is getting harder. Dell, Supermicro, Lenovo, Cisco, the cloud providers, and the neoclouds want the same budget. HPE has to show why its specific combination of servers plus networking plus GreenLake plus AI operations plus storage plus NVIDIA integration, produces a different outcome, not just a different logo on the same rack.
  • The economics proof may be the hardest. AI factory ROI depends on utilization, power, cooling, staffing, software, networking, data locality, and operational maturity, a long chain where any weak link undercuts the TCO story. Buyers should treat vendor TCO claims skeptically, including HPE’s. Piennar cited independent studies showing enterprise AI ROI climbing from around 5% a year ago toward the 30% range today. The catch: returns depend on intentionality. As Shirhatti put it, the successful customers “pick 4 or 5 projects and go deep,” reimagining processes rather than lightly optimizing them, instead of dipping a toe and hoping.

Takeaways

If you remember five things from Discover, make it these:

  • AI is a production infrastructure problem now. The unit of analysis is the whole system: data, memory, GPUs, CPUs, network, storage, power, cooling, governance, observability, recovery, and the operating model around all of it.
  • Inference and agents are moving the economic center of gravity. Training still matters, but most enterprises will feel AI’s cost and complexity through inference: agent loops, tool calls, context retrieval, model routing, governance, and continuous operation.
  • HPE’s most credible AI story is enterprise deployment infrastructure. It’s a less thrilling pitch than frontier AI, but it’s where enterprises will feel the most pain.
  • Sovereignty is an infrastructure architecture now. TELUS, the University of Utah, and other regional AI factory examples show local control, data residency, and national capacity moving from policy language into actual deployments.
  • Agent governance needs recovery as much as prevention. Enterprises will have to monitor agents, meter them, constrain them, and roll back cleanly when one gets it wrong.

The Bottom Line

The first phase of generative AI was about access to intelligence. The next phase is about making that intelligence useful, governable, economical, and resilient inside real organizations, a less glamorous story than frontier models and quite possibly the more consequential one for enterprises. Production AI needs infrastructure that can carry persistent inference, agentic workflows, governed data access, workload placement, observability, recovery, and an increasingly complicated set of economics. That’s the terrain HPE wants to own.

What HPE still owes its buyers is a simpler story. Its own partner made the point better than any critic could. AI, Shirhatti told me, has “helped break down silos within companies,” because you “cannot operate in a silo to go build an AI factory.” Storage, compute, software, and data teams that used to run separately now have to plan as one. HPE’s product line has the same problem in mirror image. Private Cloud AI, AI Factory, Sovereign AI Factory, GreenLake Intelligence, Morpheus, OpsRamp, Zerto, Aruba, Juniper, Alletra, ProLiant, Cray, and NVIDIA each have a role to play, but buyers don’t want a parts list. They buy a clear answer to a practical question: how do we run AI at scale, under control, at a cost we can understand? Get the parts to answer that question in one voice, and HPE’s bet looks smart. Leave them as a catalog, and the company cedes the narrative it’s best positioned to win.

That, more than any single announcement, was my takeaway from HPE Discover 2026. Enterprise AI is moving from possibility to production. HPE is betting production will be an infrastructure problem, and on the evidence of this event, that’s a bet worth taking seriously.

If you’re running enterprise AI in production, or wrestling with where your inference should run, I’d love to hear how you’re thinking about it. And if you were at Discover too, tell me what landed for you.