What prevents AI from spreading is not the model but the infrastructure... The role of Kubernetes' "unified operation" is becoming increasingly prominent.

robot
Abstract generation in progress

The bottleneck of AI diffusion is not in the model, but in “infrastructure”—this diagnosis is increasingly being confirmed.

At the recent “KubeCon + CloudNativeCon Europe” conference, it became clear that the core of AI competition is no longer just about model performance. Analysis indicates that the biggest bottleneck in deploying AI into actual services is the structural limitation that prevents systems dispersed across the cloud, edge, and on-premises from operating as a unified whole.

New research shows that the vast majority of AI projects fail to reach operational stages, with failures more often due to integration and operational execution issues rather than the models themselves. Paul Nashaavati, chief analyst at TheCube Research, pointed out: “AI is revealing fundamental flaws in enterprise infrastructure,” and “the widespread fragmentation of cloud, edge, and on-premises deployments has become the biggest obstacle to operational AI.”

The “Sovereignty” issue makes AI infrastructure more complex

This fragmentation has recently been labeled as “sovereignty.” Because data sovereignty, regional regulations, and internal corporate policies intertwine, data and workloads are difficult to centralize. As a result, AI systems are being forced to evolve from a single stack into a distributed operation across multiple environments.

Mike Barrett, vice president and general manager of Red Hat’s hybrid platform division, cited the example of different business units using various large language models, explaining that enterprise customers want not tools tailored to specific environments, but a “horizontal platform” at the enterprise level. To address this, Red Hat is focusing on building a control layer based on Kubernetes that manages AI workloads uniformly across all environments, called the “AI control plane.”

Kubernetes is evolving beyond orchestration into an “operational consistency” tool

Originally, Kubernetes was not designed for AI inference. Its primary role was closer to deploying and managing containers. But as AI inference moves into actual service environments, issues such as regional consistency gaps, latency fluctuations, resource contention, and policy drift have become prominent in daily operations.

Rob Shouty, director of engineering at Red Hat, mentioned the open-source inference framework “llm-d,” explaining that users not only want to build cutting-edge high-performance systems but also want to address the complexities of subsequent operational phases. This means instability in AI systems often occurs not during training but during real-world service operation.

Jan Meren, vice chairman of the Cloud Native Computing Foundation (CNCF) steering committee, also raised similar concerns. He analyzed that while cloud-native has developed into a global open-source collaboration, AI is causing conflicts between building systems based on “global consistency” and the realities of regional regulations and distributed environments.

Paul Nashaavati commented: “The essence of proxy AI is not a model problem but a platform architecture problem,” and future competitiveness will depend more on building better infrastructure than on choosing better models.

Platform engineering is emerging as a practical solution for AI operations

The problem is that Kubernetes is too complex for all teams to handle directly. Brian Stevens, CTO of Red Hat AI, stated that many data scientists creating AI currently also bear the responsibility of managing infrastructure. The way to bridge this gap is through platform engineering.

Shouty explained that as fragmentation of tools, skills gaps, and operational complexity become bottlenecks, the industry is shifting toward unified control structures centered on platform engineering and Kubernetes. Under this trend, Red Hat OpenShift AI is tasked with abstracting learning, deployment, service, and inference across hybrid environments in a repeatable manner.

Virtual machines are also entering Kubernetes

Enterprise infrastructure will not be modernized all at once. Core legacy assets like billing systems and databases often remain in their original environments due to risk management. This results in virtual machines (VMs) and containers operating in a long-term bifurcated manner.

Surveys show that 84% of IT decision-makers face difficulties managing VMs and containers separately. Daniel Messel of Red Hat said, “Virtualization and containers should not remain isolated islands; they should be on the same platform.” The mature project KubeVirt within CNCF allows virtual machines and containers to run simultaneously inside Kubernetes.

This is interpreted as a strategy not to eliminate legacy systems but to bring existing systems into the same control layer, integrating operational interfaces.

Some also point out that “convenience” does not equal control

Although sovereignty-based AI seems like an alternative, some argue it actually comes with more restrictions. Regulations limit data movement, and corporate policies hinder centralization. As a result, workloads must be distributed across cloud, on-premises, and edge, regardless of whether companies are ready.

Gabriele Bartolini of EnterpriseDB emphasized that without ensuring database portability, true sovereignty is impossible. He clarified that the “convenience” of managed services does not equate to control. Jan Meren also stated that in sovereignty discussions, it is important to distinguish between “code sovereignty” and “deployment sovereignty”; code can exist as a global open-source asset, but actual deployment is directly affected by laws and policies.

In this context, Kubernetes’s role becomes clearer: it connects globally shared code into a form capable of operating in environments with different regional restrictions.

Ultimately, victory or defeat depends on the ecosystem

A single enterprise cannot bear the entire AI infrastructure alone. To make the AI control plane in Kubernetes effective, it’s not about replacing multiple systems but connecting them. This is made possible by an ecosystem composed of standards, APIs, and upstream open-source projects.

Nashaavati commented that Red Hat is not only a commercial platform provider but also one of the most active contributors within the CNCF ecosystem. This upstream work is not just about simple image management but is a core mechanism to prevent discrepancies among different vendor implementations of Kubernetes and maintain consistency. Red Hat is also collaborating with NVIDIA on the “Red Hat AI Factory,” building scalable enterprise AI infrastructure combining OpenShift and NVIDIA accelerated computing.

Nashaavati stated: “Considering that up to 75% of enterprises experience double-digit AI failure rates due to system fragmentation, the bottleneck has shifted to infrastructure.” This means the problem is not a lack of features but a structural difficulty in enabling systems to work together seamlessly.

Kubernetes rising as the production layer in the AI era

Rather than AI breaking at a specific point, it is more accurate to say

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin