Visualização de leitura

Why modernization is defining the next decade of cloud

Cloud adoption is no longer the differentiator it once was. Over the past decade, enterprises have moved aggressively to the cloud to improve scalability, reduce infrastructure constraints, and accelerate innovation. Today, most organizations operate in hybrid or multicloud environments, and cloud has become the baseline rather than a competitive advantage.

What separates leaders now is not whether they are in the cloud, but how effectively they modernize and operate it.

Many enterprises are discovering that their current environments, while technically in the cloud, still reflect legacy design decisions. Applications may have been lifted and shifted without being re-architected. Data remains fragmented across systems. Operations are often managed through manual processes and disconnected tools. These limitations restrict agility and prevent organizations from fully realizing the value of their cloud investments.

Modernization addresses this gap. It is not simply a technology upgrade, but a shift in how applications, data, and operations are designed to support continuous innovation. Organizations that modernize effectively can improve performance, increase resilience, and create the foundation required for advanced capabilities such as artificial intelligence.

A key driver behind this shift is the growing importance of data. As enterprises invest in AI and analytics, the ability to access, govern, and activate data across environments becomes critical. Without a modern data foundation, AI initiatives struggle to scale and deliver consistent results. Data that is siloed, inconsistent, or difficult to access limits both operational efficiency and decision-making.

At the same time, application modernization is becoming essential. Legacy applications are often not designed to take advantage of cloud-native capabilities such as elasticity, automation, and microservices architectures. Modernizing these applications enables faster development cycles, improved scalability, and better alignment with evolving business needs.

However, modernization is not limited to applications and data. It also requires a transformation in how cloud environments are operated. Many organizations still rely on reactive operating models, where teams respond to issues as they arise. As environments grow more complex, this approach becomes increasingly difficult to sustain.

In fact, as explored in Why Cloud Innovation Slows, many enterprises find that outdated operational approaches create friction, slow delivery, and increase costs, even in modern cloud environments. Moving toward more proactive, automated operations is a critical component of successful modernization.

This evolution is being accelerated by the rise of AI. Organizations are not only building AI capabilities but also embedding intelligence into how systems are managed and optimized. AI-driven operations can help identify inefficiencies, automate routine tasks, and improve overall system performance. As a result, modernization efforts are increasingly tied to broader AI strategies.

The benefits of modernization extend beyond technology. Organizations that modernize effectively are better positioned to respond to market changes, launch new products, and improve customer experiences. They can operate with greater efficiency while maintaining the flexibility needed to adapt to new opportunities.

However, the path to modernization is not always straightforward. It requires careful planning, clear priorities, and alignment across teams. Enterprises must balance the need to maintain existing systems with the need to invest in future capabilities. This often involves making strategic decisions about which applications to re-architect, which to retire, and how to integrate new technologies into existing environments.

Partnerships can play an important role in this process. Organizations benefit from working with providers that bring both technical expertise and operational experience. This helps reduce risk, accelerate timelines, and ensure that modernization efforts are aligned with business outcomes.

For CIOs and technology leaders, the message is clear. The next phase of cloud is not about adoption. It is about evolution. Modernization is the mechanism that enables organizations to move from simply running workloads in the cloud to fully leveraging its capabilities.

As cloud environments continue to grow in complexity, the ability to modernize effectively will determine which organizations can innovate at scale. Those that invest in modern architectures, unified data foundations, and intelligent operations will be better positioned to compete in the years ahead.

Modernization is the foundation for agility, resilience, and intelligence — and the gateway to becoming an AI-ready enterprise. Discover how to modernize your applications, infrastructure, and data in ways that help your organization drive continuous innovation.

Download our e-book: Modernization Without Limits: Building the AI-Ready Enterprise

Intel, behind in AI chips, bets on quantum and neuromorphic processors

Intel for years chopped critical products including CPUs, GPUs and networking gear to cut corporate fat and get back into shape.

Many cuts pre-date the appointment last year of Lip-Bu Tan as CEO. Now, Tan is placing a long-term bet beyond the current crop of AI chips and doubling down on quantum processors and neuromorphic chips, which survived Intel’s earlier product cuts.

Tan has now tapped company veteran Pushkar Ranade to be Intel’s new chief technology officer, with a mission to drive developments in “quantum computing, neuromorphic computing, photonics, and novel materials,” the chipmaker announced this week.

The move is a longer-term bet, according to Dylan Patel, CEO of semiconductor research firm SemiAnalysis. “It’s a bit further out stuff he is doing, so it wouldn’t help with the next two years of products,” he said, adding that Ranade is an excellent choice for Intel’s move into future computing models.

Multiple analysts said Intel’s quantum group has been hindered by limited funding and resources and hurt by staff turnover. Former CEO Pat Gelsinger and CTO Greg Lavender departed the company last year.

Quantum uncertainty

There’s very little known about Intel’s quantum computing efforts. The company’s most recent quantum chip, Tunnel Falls, was announced back in 2023.

But there’s leadership continuity, with quantum hardware leader James Clarke and quantum systems and software leader Anne Matsuura still at the company.

“Maybe this means Lip-Bu wants to [reorient] Intel’s focus and investment in quantum computing,” said Jim McGregor, principal analyst at Tirias Research.

Intel has a solid record of success with technology moonshots, and its neuromorphic chip development is the best in the business, said Ian Cutress, chief analyst at semiconductor consulting firm More Than Moore.

“Intel’s [quantum] approach, since [former CEO Robert] Swan took over, to be honest, has been a lot less public. They would need to match — if not surpass — to develop their current quantum technologies beyond their competitors,” he said.

One of those competitors, IBM, is far ahead with its quantum efforts. The company has a quantum cloud available for rental now and a mature product plan for the next several years. IBM has “an open roadmap to 2033, which they’ve been working on since 2022, and every year they’ve been hitting their targets like clockwork,” Cutress said.

Intel’s investment arm, Intel Capital, recently invested $178 million in quantum processor company QuantWare. But an investment by Intel Capital doesn’t always mean Intel adopts a technology.

Nonetheless, enterprises should take a measured view of Intel’s pivot and “always take emerging technology talk with a grain of salt,” said Cutress. He argued that the long legacy of digital computing architecture is difficult to unseat.

“The reality is that any technology that comes from this side of R&D is going to work alongside current high-performance hardware, not replace it,” he said.

The hardware stack will likely look like a combination of CPU, GPU, and quantum computing chips in a datacenter, not just a quantum processor working on its own, Cutress said.

“IBM, Google, Microsoft have realized this and are pivoting those messages,” he said.

Quantum processors and AI supercomputers naturally complement each other, said Pranav Gokhale, co-founder and CTO of Infleqtion, which makes quantum processors. “Quantum computers can access physics that is difficult for classical machines to emulate, while GPUs provide the scale and throughput needed for control and learning.”

Intel’s spin-bit quantum technology, which differs from IBM’s supercomputing qubit, may be interesting technology, but many companies — including Quantum Motion, Silicon Quantum Computing, Photonic, and CEA-Leti — are pursuing similar approaches.

Quantum advantage

Still, Intel has a manufacturing advantage.

“Intel’s approach to CMOS spin qubits has one advantage over many other solutions — you can put millions on a wafer, and Intel has reliable manufacturing to do so,” Cutress said.

The appointment of Ranade, who has served in key manufacturing roles, to CTO is another clear sign that the foundry is Intel’s future. Former CTO Greg Lavender was seen more as a software person.

“He’s a process node guy, he knows what the process needs to work for customers, both internal and external,” Cutress said of Ranade.

Intel did not immediately respond to a request for comment on its plans.

This article first appeared on Network World.

Cloud modernization is advancing. Utilization isn’t

At Datadog, an observability and security platform for cloud applications, I work on research studies that analyze anonymized infrastructure telemetry from thousands of production environments across Kubernetes, managed container platforms and serverless services across cloud providers. The datasets span multiple cloud providers and billions of workload hours. Much of that work goes into our annual reports on container and serverless adoption, where we examine how organizations run workloads in modern cloud environments.

Over the past few years, one question kept coming up as we updated these reports: As cloud platforms become more granular and autoscaling adoption increases, does resource efficiency improve?

Going into this work, I didn’t have a formal hypothesis about utilization improving over time. But there was an implicit assumption—one that felt reasonable. As platforms became more granular and autoscaling adoption increased, resource efficiency should improve at least incrementally.

It didn’t.

When we compared successive editions of the research, including the 2023 Container Report and the 2025 State of Containers and Serverless, the answer was less straightforward than expected. The share of Kubernetes workloads running well below their requested CPU and memory levels remained broadly consistent between reports.

That persistence raises an uncomfortable question: If modernization alone doesn’t improve utilization, what does?

Rapid evolution in cloud infrastructure

Cloud environments today look markedly different from even three years ago.

In the 2023 Container Report, we found that over 65% of Kubernetes workloads were using less than 50% of their requested CPU and memory. That report examined container telemetry across thousands of production environments to understand how teams run Kubernetes workloads.

Two years later, the 2025 State of Containers and Serverless expanded the scope of the research to look at broader compute patterns, including the growing mix of containers and serverless, while continuing to analyze Kubernetes workloads.

Using the same <50% threshold for comparison, the overall utilization pattern remained similar. In October 2025, 72% of Kubernetes workloads were still using less than 50% of their requested CPU, and 62% were using less than 50% of their requested memory.

In other words, even as organizations adopted newer compute models and expanded autoscaling, most workloads continued to run well below their requested capacity.

At a surface level, the modernization between those report cycles is obvious: More granular compute models, broader instance diversification, increased use of managed services and deeper abstraction.

Looking only at platform capabilities and adoption trends, this appears to be steady operational maturity, the kind often discussed in CIO.com’s own coverage of cloud strategy.

If modernization alone were enough, we would expect to see measurable improvement in utilization patterns. The data suggests otherwise.

The utilization baseline barely moved

Using the same <50% threshold for comparison, the 2025 data shows a familiar pattern. In October 2025, 72% of Kubernetes workloads were using less than 50% of their requested CPU, and 62% (vs. 65% in 2023) were using less than 50% of their requested memory.

In other words, most workloads still operate well below their provisioned capacity.

Looking even closer, the distribution becomes more pronounced. In October 2025, 57% of workloads were using less than 25% of requested CPU, and 37% were using less than 25% of requested memory.

This is not marginal inefficiency at the edges. It reflects a large share of workloads running far below their requested baseline.

When I saw those updated numbers in the 2025 report, I was a little surprised. Not because I expected perfection, since cloud systems are inherently uneven, but because I expected at least some measurable drift toward tighter provisioning as platform sophistication increased.

Instead, the overall distribution remained remarkably persistent.

To be clear, this does not imply that teams are careless or that modernization efforts failed. It suggests something more structural. Utilization behaves less like a short-term tuning issue and more like a stable characteristic of how systems are configured and operated over time.

A longitudinal comparison between the 2023 and 2025 data shows that individual workloads churn, clusters scale and instance types diversify, yet the aggregate distribution remains comparatively steady. That persistence stood out more than any single annual trend.

Importantly, the longitudinal data does not explain why that persistence exists. It only shows that modernization at the platform layer does not automatically reshape the utilization distribution.

At scale, persistent underutilization also has cost implications. Even if individual workloads appear inexpensive, conservative provisioning raises the baseline against which budgets are set.

Over time, that baseline becomes normalized, shaping cloud forecasts, contract negotiations and infrastructure investment priorities.

Averages hide persistence

Infrastructure data is rarely evenly distributed; it is long-tailed.

A relatively small number of workloads drive sustained utilization. A much larger number are bursty, intermittently active or lightly used. When averaged together, the system appears stable even when individual components are dynamic.

Averaging utilization metrics can therefore be misleading. An average implies symmetry. In practice, resource usage is asymmetric. Extreme values often drive cost and capacity exposure, while the median workload remains comparatively quiet. When those extremes are averaged away, the signals that matter most are softened.

Partial instrumentation adds another layer. Not every workload produces the same performance and utilization data at the same level of detail. As organizations mix legacy systems with newer managed services, visibility gaps are common. Those gaps can skew aggregate metrics and create a false sense of stability or efficiency.

CIOs encounter similar issues when interpreting other aggregate metrics such as average latency, mean time to recovery or blended cloud spend. As CIO.com has noted in discussions of meaningful metrics, aggregation can obscure operational reality.

In infrastructure, that obscurity can persist for years.

What “utilization” measures

Before interpreting the trend, it is important to clarify what these utilization metrics measure.

In Kubernetes environments, utilization is typically measured relative to requested resources rather than raw machine capacity. Requests influence scheduling and reserve capacity on anode, shaping the baseline against which utilization is measured. But they also encode human judgment. Sometimes that judgment is based on load testing. Sometimes it reflects historical spikes. Sometimes it is simply conservative.

Two teams can run similar services and choose very different request baselines. The utilization metric will faithfully reflect that configuration choice.

That is one reason I am cautious about treating utilization as a moral signal. It is a technical metric, but it is also a reflection of configuration decisions embedded over time.

Looking at it over time shows what changes and what stubbornly does not, even as platforms evolve.

Autoscaling isn’t the same as precision

One obvious question is whether autoscaling adoption should materially change these patterns.

Horizontal Pod Autoscaling (HPA) is common across Kubernetes environments and widely supported across platforms. This reflects broader ecosystem trends described in the CNCF Annual Survey.

But elasticity is not the same as precision.

Many autoscaling configurations still center on CPU and memory signals. More context-aware scaling, based on queue depth or application-level indicators, remains less prevalent. Vertical scaling is comparatively rare and often used in advisory modes rather than actively reshaping requests.

Workloads can scale up and down without necessarily altering their baseline request posture or the broader utilization distribution we observe.

Enabling elasticity is straightforward. Sustaining precision over time is much harder.

Technical debt doesn’t disappear with new platforms

Another pattern surfaced in the Container Reports is version lag. In both the 2022 and 2023 editions, a significant share of Kubernetes clusters were running versions approaching end-of-life even as newer releases were widely available.

Production systems rarely upgrade at the same pace as new platform capabilities are released. End-of-life versions persist. Premium support tiers extend. Older runtimes remain embedded even when more efficient versions are available.

Upgrades compete with feature delivery. Stability is prioritized. Risk is managed conservatively.

Version adoption does not directly determine utilization levels. But it reflects a broader dynamic: Configuration and upgrade decisions change more slowly than platform capabilities. When analyzed at scale, that inertia becomes visible. New tools layer onto existing systems, but earlier configuration assumptions often remain intact.

In practice, modern platforms often inherit older provisioning choices.

Capability is not outcome

Seeing the same utilization pattern persist across report cycles shifted my thinking. It was not about Kubernetes, serverless or autoscaling in isolation. It was about separating capability from outcome.

Cloud platforms today offer far more granularity than they did a few years ago. We can allocate resources in smaller increments. We can autoscale pods and nodes. We can mix execution models. We can diversify architectures.

None of that automatically changes the empirical shape of infrastructure usage.

Modernization creates new possibilities, but it does not automatically change how resources are used.

Across report cycles, it became clear that architecture was evolving faster than the underlying usage patterns.

That distinction has significant implications for how infrastructure performance—and investment decisions—are interpreted.

When platform evolution isn’t enough

If multiple years of visible platform evolution do not materially shift the utilization baseline, the constraint likely extends beyond feature availability.

What makes this pattern interesting is not that utilization is low in any single snapshot. It is that it remains low even as surrounding variables change. Platform capabilities evolve. Adoption curves shift. Workload composition becomes more heterogeneous. Yet the aggregate distribution remains comparatively stable.

That stability suggests something important: Modernization changes what is possible, but it does not automatically change how systems are configured or revisited over time.

For CIOs and senior technology leaders, the implication is not to pursue the next abstraction layer. It is to examine the decision frameworks that shape provisioning, headroom and risk tolerance year after year.

Cloud platforms will continue to evolve quickly. Whether utilization patterns change will depend less on new capabilities and more on how deliberately organizations revisit the assumptions embedded in their configurations.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

The triple squeeze: Why the SaaSpocalypse story you’re hearing is missing the most dangerous part

In early February 2026, nearly $285 billion in market value evaporated from software and related sectors in 48 hours. Atlassian dropped 36% for the month. The iShares Software ETF fell more than 30% from its September 2025 highs. Traders called it the “SaaSpocalypse.”

The popular narrative goes like this. AI coding tools have gotten so good that customers can build their own software, so why pay for a SaaS subscription when an engineer can vibe-code a replacement over a weekend?

That’s the least interesting version of what’s happening. The real story involves three forces converging on SaaS simultaneously, creating a structural trap that puts hundreds of thousands of white-collar jobs at risk. The force that will decide their fate isn’t AI. It’s a spreadsheet in a private equity office.

Force #1: AI isn’t replacing your product. It’s replacing the problem your product solves

Most enterprises won’t rebuild their tech stack with vibe coding, because that’s not how large organizations work. The bigger threat is that AI agents are making entire workflow categories obsolete. Take a SaaS ticketing product. The threat isn’t a competing ticketing system built in-house, it’s that customers are deploying AI agents to handle support directly, rethinking the pipeline from scratch. The old system isn’t replaced by a better one. It’s replaced by a fundamentally different approach to the job.

Satya Nadella telegraphed this on the BG2 podcast in December 2024, saying business applications would “probably collapse” in the agent era because they’re “CRUD databases with a bunch of business logic.” “All the logic will be in the AI tier.”

The data backs him up. Gartner forecasts worldwide AI spending will hit $2.5T in 2026, up 44% YoY, while overall IT budgets grew ~10%. That money is coming from other budgets. Average SaaS apps per company dropped 18% between 2022 and 2024 (BetterCloud). Among large enterprises, 82% are actively reducing vendor count (NPI Financial). Even companies not directly losing customers face fewer new purchases, slower expansions and harder renewals, because buyers are looking somewhere else.

Force #2: The $440 billion leverage trap

Between 2015 and 2025, private equity acquired more than 1,900 software companies in deals worth over $440 billion. The thesis was elegant. Sticky recurring revenue, high margins, predictable cash flows and high switching costs, all perfect for leveraged buyouts. It worked brilliantly for a decade. Then it stopped.

  • The setup (2020-2022). Public SaaS traded at a median 18x revenue in 2021 (Asana touched 89x). PE paid premium multiples with enormous debt. Anaplan went to Thoma Bravo for $10.4B. Coupa sold for $8B with $4.5-5B in leverage. Zendesk went private for $10.2B backed by ~$5B in private credit.
  • The collapse. By late 2025, the median public SaaS revenue multiple had fallen to 5.1x, over 70% below peak. Private software M&A multiples dropped below 3x in 2024.

Here’s the math. A PE firm buys a $100M-revenue SaaS company in 2021 at 8x ($800M), financing 40% with floating-rate debt, a $320M loan at SOFR plus 500 bps. The initial rate runs 5-6%. After Fed hikes, about 10%, or $32M annual interest. Then the multiple collapses. Even if revenue grows to $120M, at 2-3x the business is worth $240-360M. The loan is $320M. Equity sits somewhere between negative and barely positive.

This isn’t hypothetical. Wells Fargo now uses “keys handover” for cases where PE hands underwater portfolio companies to lenders. A record $25B of software leveraged loans trade below 80 cents on the dollar. Total tech distressed debt sits near $46.9B. Apollo cut its software exposure nearly in half during 2025.

When equity is underwater, PE has two choices. Walk away or shift into margin-maximization mode by cutting headcount, consolidating and extracting cash.

Force #3: AI is the cost-cutting weapon PE has been waiting for

Here’s the cruel irony. AI is killing revenue, the debt still needs servicing and AI is also the most powerful cost-cutting tool ever handed to a PE operating partner.

Most SaaS employees are white-collar knowledge workers, including engineers, PMs, marketers, CS, sales, support and analysts. Precisely where AI is making fastest inroads. Anthropic’s research found AI-exposed workers earn 47% more on average and are nearly 4x as likely to hold a graduate degree. Stanford Digital Economy Lab and Dallas Fed research shows employment among 22-25-year-olds in AI-exposed roles fell 13-16% between late 2022 and mid-2025, nearly 20% among young software developers.

Wall Street has picked its side. When Atlassian announced 1,600 layoffs (10% of workforce) to fund AI investment, the stock rose. When Block cut 4,000 jobs and Jack Dorsey said, “a significantly smaller team, using the tools we’re building, can do more and do it better,” the stock surged over 20%.

PE is moving too. Anthropic is reportedly in talks with Blackstone, Hellman & Friedman and Permira on a JV to embed Claude across portfolio companies. OpenAI is in parallel talks with Advent, Bain, Brookfield and TPG. Blackstone alone manages $1.3T+ across manufacturing, healthcare, real estate and financial services. Many licenses those companies cancel will belong to SaaS firms in other PE portfolios. As CNBC put it, “Private equity built the SaaS installed base. It may also be the one that rips it out.”

The loop closes. AI slows revenue, valuation collapses, debt becomes unsustainable and PE uses AI to cut headcount to service it. That’s the Triple Squeeze.

So, what can you actually do?

  • Assess exposure across three dimensions. First, your company. Is it PE-owned, and what vintage? Deals done at peak 2021-2022 valuations with heavy leverage are most precarious, and PitchBook or Crunchbase will tell you. Second, your role. Cost center or revenue engine? When growth stalls, PE defaults to margin maximization, and G&A, parts of marketing, internal tools and legacy product teams are vulnerable. Third, AI itself. How automatable is your day-to-day? If your core workflow is routing information, synthesizing documents or managing processes, the timeline is shorter than you think.
  • Supersize your T-shape. AI’s Achilles’ heel is scarce context. It doesn’t know your customers, your industry or why that one integration keeps breaking. Widen across adjacent roles while deepening your core with AI. Engineers can learn PM, UX and AI-assisted QA. Marketers can automate operational work with agents and build AI creative pipelines. Become an AI multiplier, someone who directs these tools with cross-functional judgment they can’t generate alone. If your employer isn’t giving you enough exposure, don’t wait. Vibe-code a side project. Pressure-test a financial model against your usual approach.
  • Build reputation while you still have a platform. Write publicly, contribute to communities, ship open source. Individual brand is a hedge against rising company-level risk, and far easier to build while employed than while competing with thousands of displaced workers.
  • If exposure is real, move early and deliberately. A wave of PE-backed SaaS layoffs would flood the market with experienced workers chasing a shrinking pool of roles. Those who fare best move while they can still be selective. But “move” doesn’t mean jumping to the first company with AI in its pitch deck. Apply the same structural thinking. Look for durable revenue, a real plan for AI-native competition, and profitability or a credible path.

The bottom line

The SaaSpocalypse narrative everyone’s debating, whether AI coding will kill SaaS, is a sideshow. The real story is financial, structural and already in motion.

Private equity spent a decade and $440 billion buying up software on a thesis that just broke. The debt doesn’t care about AI timelines or market sentiment. It comes due regardless. The only variable PE can control now is cost, and AI just made that variable dramatically easier to cut.

If you work in this industry, especially at a PE-backed company, it’s time for clear-eyed assessment of your exposure before the math makes the decision for you.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

The cloud migration fulfilling FC Bayern Munich’s AI ambitions

Management for Germany’s record-holding football championship team aims to optimize processes and provide new digital services using AI. Here, CIO Michael Fichtner discusses what the club’s IT department has implemented, and what advantages they’ll bring to the company internally, and to fans around the world.

Why did FC Bayern migrate to SAP Cloud ERP Private?

Migrating to the cloud gives us access to innovation and other developments. Some SAP services are only available in the cloud environment, so these are now accessible to us. An important aspect was the simplified integration of other technologies or services predominantly or exclusively provided as cloud services.

Another important aspect was the realignment within IT. The migration allows us to focus more on process, application, and business innovation, and therefore on topics that’ll further develop and future-proof our company.

The use of highly available cloud infrastructures also provides us with additional security since in critical situations, we’ll benefit from professional backup and disaster recovery strategies. With all the dedication our employees have shown so far, this will be a further step toward professionalizing operations and further reducing risks.

In addition to security, scalability and flexibility are always important to us. Computing power, storage, and network resources can be scaled more quickly with a cloud provider. This is particularly significant in the frequent peak situations of our business model. For our projects, new systems like sandbox, test, and POC systems can be deployed faster and in a more standardized way, without requiring any investment or new equipment. Plus, security and compliance are becoming increasingly important for us. So migration allows us to leverage our partner’s established security features, and centrally managed access and authorization concepts simplify our operations. Certified data centers also directly support us to meet regulatory, association, and official requirements.

SAP’s strategy is consistently moving toward the cloud, and migration has allowed us to eliminate the risk of eventually having to rely on an outdated on-premise technology so we were able to eliminate legacy tech through migration as well as upgrade to modern, high-performance hardware.

How many applications or systems have been migrated to the cloud?

We migrated our multi-tiered SAP S/4HANA system. But before the migration, we worked together to consolidate our system landscape, merging 52 systems carrying fan data into S/4. There, the central fan database was established, the Golden Fan Record was built, and the data was combined into a redundancy-free, 360-degree view. So this approach was a significant milestone to implement our sovereign cloud strategy.

So we’ve only migrated one system physically, but in abstract terms, our phased approach allowed us to migrate data from all 52 systems to the cloud through consolidation, thus taking a big step toward controlled and consistent data sovereignty.

Which digital innovations does FC Bayern want to implement with the cloud?

Our business model is heavily influenced by peak situations like knockout phases in sporting competitions, live broadcasts, and special sales activities. In these situations, we need to not only scale technically, but provide innovative process solutions that reliably support peak loads.

Consider the short timeframes of ticket requests that must be processed during knockout stages. Or the launch of jerseys, where fans, even during peak periods, have the right to expect that goods will be delivered as quickly as possible. So in departments experiencing significant annual peaks in volume, it’s crucial employees receive highly automated support. Handling these seasonal peaks would otherwise be impossible.

We rely heavily on solutions supported by AI and digital agents, so developing them is always a joint initiative with our specialist departments.

What digital services and personalization strategies is FC Bayern planning to use to reach fans worldwide with the help of the new cloud platform?

Our aim is to address our fans in an individual, personalized way. The way forward is to move away from mass communication and large target groups or segments, and toward a personal approach, specifically tailored to the needs of each fan.

For this, we need the relevant data and ability to process large amounts of data in compliance with data protection regulations. This isn’t feasible without the appropriate infrastructure and scalability. We see personalized communication as a crucial element to remain relevant to our fans in the future. Mass mailings to fans via email, push notifications, or standardized content without specific relevance to the individual fan won’t help us remain attractive to them.

By providing targeted, relevant content, we want to further increase the attractiveness of FC Bayern Munich, and ensure the relationship with fans for the future.

What advantages do you expect from SAP Cloud ERP Private and AI?

A crucial factor in our decision to migrate was the conviction that we could significantly optimize our internal processes by using AI approaches. Specifically, we’re working on corresponding implementations in HR using SAP’s SuccessFactors and Concur. Initial approaches have also been developed and are being put in logistics and financial accounting. We expect this will allow us to increasingly automate more activities, freeing up colleagues in specialist departments to focus on specific tasks that require a particular approach or interaction. Ultimately, this will enable us to provide better service to fans as we gain time to address other issues.

What role did digital sovereignty or data sovereignty play in the decision to migrate to the SAP cloud?

Digital sovereignty, and control over our data and the data of our fans, have been of paramount importance for many years, and have guided our actions for just as long. Driven by this principle, we’ve developed and operated our key applications ourselves.

With the capabilities our partners have made available to us, we could implement these requirements in a sovereign cloud environment without compromising standards. So we’re confident we’ve not created any dependencies and will remain operational in the years to come. We’re convinced that the de facto and legal control of our critical data is sustainably ensured in our chosen setup.

칼럼 | 화려한 AI보다 현실적 통제…구글이 제시한 에이전트 전략

구글이 지난주 개최한 연례 컨퍼런스 ‘구글 클라우드 넥스트 2026’에서 내놓은 발표 가운데 가장 주목할 점은 새로운 모델이나 TPU가 아니었다. 기업 전반에 제미나이를 확산하는 또 다른 방식 역시 핵심은 아니었다.

오히려 이는 하나의 인정이자, 동시에 경고에 가까운 메시지로 읽힌다.

에이전트에는 감독이 필요하다

이미 알고 있던 사실이지만, “알고도 실행하지 않으면 진정으로 아는 것이 아니다”라는 말처럼 실제로 이를 실천하는 것은 또 다른 문제다. 우리는 에이전트를 분주하게 일을 처리하는 디지털 직원처럼 여기지만, 동시에 이들은 인증 정보와 예산, 메모리, 민감 데이터 접근 권한을 가진 취약한 소프트웨어 시스템이기도 하다. 게다가 비용이 크게 들고 원인 추적이 어려운 방식으로 실패하는 특성까지 갖고 있다.

이것이 ‘구글 클라우드 넥스트 2026’의 본질적인 메시지다. 많은 이들은 구글이 에이전틱 엔터프라이즈 시장을 선점하기 위해 나섰다고 해석하지만, 보다 흥미로운 해석은 구글이 이를 ‘통제하기 위해’ 등장했다는 점이다.

물론 구글은 ‘에이전틱 클라우드(agentic cloud)’를 적극적으로 강조했다. 요즘 어떤 행사에서도 빠지지 않는 주제다. 제미나이 엔터프라이즈 에이전트 플랫폼, 8세대 TPU(Tensor Processing Unit), 새로운 워크스페이스 인텔리전스 AI(Workspace Intelligence AI) 기능, 그리고 기업 전반에 AI를 자연스럽게 녹여내기 위한 다양한 통합 기능도 함께 발표했다. 에이전트 시대의 성과를 자축하는 자리로만 본다면 충분한 발표였다.

하지만 화려한 연출을 걷어내면 더 중요한 메시지가 드러난다. 지난 2년 동안 기업은 AI 에이전트에 열광해 왔고, 이제는 이들이 기업의 평판을 해치거나 재무적 손실을 일으키거나 민감 정보를 노출하지 않도록 통제해야 할 단계에 이르렀다는 점이다.

이는 구글을 비판하는 이야기가 아니다. 오히려 그 반대다. 이번 행사에서 가장 실질적인 가치가 있는 발표일 수 있다.

“신뢰하되 검증하라”

AI가 단순히 말하는 수준을 넘어 실제 행동을 수행하기 시작하는 순간, 기업 환경에서는 필수적인 질문들이 쏟아진다. 누가 이를 승인했는지, 어떤 데이터를 사용했는지, 어떤 시스템에 접근했는지, 왜 그런 행동을 했는지, 비용은 얼마나 들었는지, 그리고 필요할 경우 어떻게 중단할 수 있는지 등이다.

구글의 이번 발표는 상당 부분 이러한 질문에 대한 답변으로 구성됐다.

구글이 강조한 내용을 보면 이를 분명히 알 수 있다. 지식 카탈로그(Knowledge Catalog)는 기업 데이터 전반에서 신뢰할 수 있는 비즈니스 맥락을 제공해 에이전트의 판단을 보완하도록 설계됐다. 제미나이 엔터프라이즈에는 장시간 실행되는 에이전트를 포함해 이를 관리·모니터링할 수 있는 기능이 추가됐다.

워크스페이스에는 에이전트의 데이터 접근을 모니터링하고 제어하며 감사할 수 있는 기능이 도입돼 프롬프트 인젝션, 과도한 정보 공유, 데이터 유출 위험을 줄인다. 또한 구글 클라우드는 에이전트 방어 기능과 위즈(Wiz) 기반 보안 체계를 통해 클라우드와 AI 개발 환경 전반에서 에이전트를 보호할 수 있도록 했다.

이러한 기능들은 시스템이 완벽하게 작동할 때 필요한 도구가 아니다. 오히려 “데모에서는 잘 작동했지만 실제 업무에 맡겨도 되는가”라는 현실적인 고민에 직면한 기업을 위해 만들어진 것이다.

에이전트 관리 계층

업계 분석가들은 기업용 AI의 새로운 계층을 설명하는 용어로 ‘에이전트 컨트롤 플레인(agent control plane)’에 점차 합의하는 분위기다. 익숙한 개념이라는 점에서 적절한 표현이다. 마치 쿠버네티스(Kubernetes)가 인프라를 통합 관리하듯, AI 에이전트의 동작을 중앙에서 관리하는 플랫폼을 떠올리게 한다. 즉, 다수의 AI 에이전트를 한곳에서 관리하고 관찰하며, 라우팅·보안·최적화를 수행할 수 있는 통합 시스템을 의미한다.

하지만 현실은 아직 그 단계와 거리가 멀다.

에이전트에 컨트롤 플레인이 필요한 이유는 이들이 이미 직원을 대체하고 있어서가 아니다. 오히려 기업이 확률 기반 시스템인 에이전트를 기존의 결정론적 업무 프로세스에 연결하면서, 그 사이를 누군가 반드시 관리해야 한다는 사실을 깨닫고 있기 때문이다. 에이전트 데모에서는 자율성이 깔끔하게 보이지만, 실제 엔터프라이즈 시스템에서는 상황이 훨씬 복잡하게 전개된다.

고객 데이터는 한 시스템에, 계약 정보는 또 다른 시스템에 흩어져 있고, 예외 처리는 누군가의 이메일함에 남아 있으며, 정책 문서는 2021년에 업데이트된 PDF 파일에 머물러 있는 경우가 많다. 게다가 해당 업무 흐름을 이해하던 담당자는 팬데믹 기간 중 회사를 떠났을 수도 있다.

이처럼 복잡한 환경에 이제 에이전트까지 추가되고 있다.

이 때문에 필자는 구글의 컨트롤 플레인 전략에 일정 부분 공감하면서도, 지나치게 정돈된 벤더의 서사에는 여전히 경계심을 갖는다. 통합 에이전트 플랫폼, 거버넌스, 모니터링, 평가, 관측성, 시뮬레이션 기능은 모두 필요하다. 특히 제미나이 엔터프라이즈는 기업이 개별적으로 엮어 왔던 복잡한 운영 요소를 중앙화하려는 시도라는 점에서 의미가 있다.

다만 컨트롤 플레인을 실제 업무 그 자체로 오해해서는 안 된다.

파일럿은 쉽고, 운영은 어렵다

에이전틱 AI 관련 데이터는 한 가지 메시지를 반복하고 있다. 기대감이 실제 운영 성숙도를 크게 앞서고 있다는 점이다.

업무 자동화 기술 카문다(Camunda)의 ‘2026 에이전트 오케스트레이션 및 자동화 현황’ 보고서에 따르면, 71%의 조직이 AI 에이전트를 사용하고 있다고 답했지만 지난 1년간 실제 운영 환경에 적용된 사례는 11%에 그쳤다. 또한 73%는 에이전틱 AI에 대한 비전과 현실 사이에 격차가 있다고 인정했다.

가트너 역시 비슷한 전망을 내놓았다. 2027년 말까지 에이전틱 AI 프로젝트의 40% 이상이 중단될 것으로 예상되며, 그 이유로는 비용 부담, 불명확한 비즈니스 가치, 미흡한 리스크 관리가 꼽힌다.

분명히 짚고 넘어가야 할 점은, 이것이 모델의 문제가 아니라는 사실이다. 전형적인 엔터프라이즈 소프트웨어 운영 문제에 가깝다.

이 같은 흐름은 보안과 거버넌스 영역에서도 동일하게 나타난다. 생성형 AI 관리 플랫폼 라이터(Writer)의 2026 조사에 따르면, 67%의 경영진이 승인되지 않은 AI 도구로 인해 데이터 유출이나 보안 사고를 경험했다고 답했다.

또한 36%는 AI 에이전트를 감독하기 위한 공식적인 계획이 없으며, 35%는 문제가 발생한 에이전트를 즉시 중단할 수 없다고 밝혔다.

세 가지 가운데서도 특히 마지막 수치가 가장 우려되는 대목이다. 기업 시스템과 고객 데이터, 조직의 인증 정보에 접근할 수 있는 소프트웨어 에이전트임에도 불구하고, 3분의 1이 넘는 기업이 문제가 발생했을 때 이를 신속하게 중단할 수 있다고 확신하지 못하고 있다.

그럼에도 정말 걱정하지 않아도 되는 걸까?

에이전트는 덜 중요한 요소

에이전틱 엔터프라이즈 환경의 숨겨진 진실은, 정작 에이전트 자체는 아키텍처에서 가장 덜 중요한 요소일 수 있다는 점이다. 모든 주목과 기대는 에이전트에 쏠리지만, 실제 핵심은 따로 있다. 인증과 권한 관리, 워크플로 경계 설정, 데이터 품질, 검색과 메모리, 평가 체계, 감사 추적, 비용 통제, 그리고 에이전트가 혼란에 빠졌을 때 어떤 시스템을 ‘단일 진실의 원천(source of truth)’으로 삼을지 결정하는 문제 등이 진짜 과제다.

구글 클라우드 넥스트에서의 발표는 에이전틱 엔터프라이즈가 이미 도래했음을 증명하지는 않았다. 대신, 에이전틱 기업이 현실화된다면 결국 기존 엔터프라이즈 소프트웨어가 중요한 국면에 접어들었을 때와 매우 유사한 모습이 될 것임을 보여줬다. 마법 같은 혁신보다는 거버넌스 중심의 구조로 수렴한다는 의미다.

이는 분명 진전이지만, 결코 ‘화려한 발전’은 아니다.

에이전틱 AI 시장에서 승자를 가려내고 싶다면, 가장 똑똑한 에이전트를 가진 기업을 찾기보다 데이터 계약이 명확하고, 평가 체계가 정교하며, 일관된 인증 모델을 갖추고, 비공식적인 ‘섀도우 AI’ 확산을 최소화하는 기업을 주목해야 한다. 그러나 업계는 이러한 이야기를 꺼리는 경향이 있다. 자율적으로 일하는 디지털 노동자에 대해 말하는 것이 데이터 계보나 접근 통제를 논하는 것보다 훨씬 흥미롭기 때문이다.

하지만 엔터프라이즈 소프트웨어가 현실이 되는 지점은 바로 이런 ‘지루함’ 속에 있다.

에이전트 시대의 도래를 성급히 선언하기 어려운 또 다른 이유도 있다. 에이전트의 유용성은 결국 안전하게 이해하고 활용할 수 있는 데이터에 달려 있기 때문이다. 구글 역시 이를 분명히 인식하고 있다. 지식 카탈로그 크로스 클라우드 레이크하우스 전략을 포함한 ‘에이전트 데이터 클라우드’ 개념은, 에이전트가 신뢰할 수 있는 비즈니스 맥락을 필요로 한다는 점을 인정한 것이다.

이러한 맥락이 없다면 에이전트는 엔터프라이즈의 업무 수행자가 아니라, 시스템을 떠도는 ‘말 잘하는 관광객’에 불과하다.

결국 이번 구글 클라우드 넥스트에서 가장 고무적인 발표는 에이전트를 더 자율적으로 만드는 기술이 아니었다. 오히려 에이전트를 더 잘 관리할 수 있도록 만드는 기능이었다. 에이전틱 AI는 거대한 가능성을 지니고 있지만, 그것이 현실이 되기 위해서는 무엇보다 ‘지루할 만큼 안정적인’ 특성을 입증해야 한다.
dl-ciokorea@foundryco.com

Designing the AI-native cloud: What enterprise architects are learning the hard way

A few years ago, enterprise cloud conversations followed a familiar pattern. Teams discussed migrating legacy applications, modernizing infrastructure and reducing data center costs. The goal was clear: Move workloads to scalable cloud platforms and gain operational flexibility.

But in recent months, the tone of these conversations has shifted dramatically.

In architecture reviews and infrastructure planning sessions I’ve participated in, the questions now sound very different:

  • Where will the model training run?
  • Do we have access to GPU clusters?
  • Can our data pipelines support real-time inference?

The reason is simple: Artificial intelligence — particularly generative AI — is pushing enterprise infrastructure beyond what traditional cloud architectures were designed to handle. What many organizations are discovering is that the future isn’t just cloud-first. It’s AI-native.

When AI becomes the workload that breaks the cloud

In many organizations, the turning point arrives when a team attempts its first large-scale generative AI deployment.

A business unit might want to build a document intelligence system, an internal knowledge assistant or a predictive analytics platform powered by large language models. On paper, this looks like just another cloud workload. But implementation quickly reveals the difference.

AI workloads behave nothing like traditional enterprise applications. They require massive datasets, GPU-accelerated compute and high-throughput data pipelines capable of feeding machine learning models continuously. Infrastructure designed for transactional systems often struggles under these conditions.

I’ve seen teams discover this firsthand when their existing cloud environments suddenly become bottlenecks — not because of application traffic, but because of AI model training workloads. This is the moment many organizations realize: AI isn’t just another application in the cloud. It’s a new infrastructure paradigm.

In some cases, even well-architected microservices environments fail to keep up, exposing limitations in storage I/O, network latency and workload isolation. These hidden constraints often only surface under sustained AI workloads, making them difficult to predict during initial planning phases.

AI-native infrastructure: GPU clusters and high-performance compute

Traditional enterprise cloud environments were optimized for CPU-based workloads and transactional applications. AI systems, by contrast, prioritize GPU-accelerated compute, high-bandwidth networking, distributed storage and scalable training pipelines.

Tools like AMD ROCm highlight this shift toward GPU-native ecosystems, offering a full-stack platform designed specifically for high-performance AI workloads. But adopting GPU infrastructure is not just about provisioning capacity — it is about using it efficiently.

Many organizations underestimate the complexity of GPU scheduling, memory fragmentation and workload contention. Unlike CPU workloads, which can be easily distributed, GPU workloads require careful orchestration to avoid underutilization.

These platforms demonstrate that AI workloads are reshaping how cloud infrastructure is designed — from CPU-centric compute layers to AI-native architectures optimized for massive parallelism and high-throughput data processing.

Additionally, emerging innovations such as specialized AI accelerators and custom silicon are further complicating infrastructure decisions. Architects must now evaluate not just performance, but portability and vendor lock-in when selecting hardware strategies.

The rise of distributed AI across hybrid environments

Another pattern emerging in enterprise AI deployments is the move toward distributed infrastructure.

Early cloud adoption encouraged organizations to consolidate workloads within a single cloud provider. This simplified governance and reduced operational complexity.

But AI workloads often introduce new constraints. Certain datasets must remain within private infrastructure for compliance reasons. Training large models requires specialized GPU clusters available only in specific cloud regions. Real-time inference may need to run close to where data is generated. As a result, many enterprises are now operating hybrid and multi-cloud AI environments.

Platforms such as Google Cloud Vertex AI are explicitly designed for hybrid AI pipelines, enabling organizations to train and deploy models across on-premises systems and multiple cloud environments.

In these environments, AI is not confined to a single cloud environment. Instead, intelligence is distributed across infrastructure layers.

The challenge shifts from deploying applications to orchestrating AI systems across multiple environments.

This distribution also introduces new challenges around data consistency, model versioning and latency management. Ensuring that models behave consistently across environments becomes a critical requirement, particularly in regulated industries.

Intelligent orchestration is becoming essential

As AI infrastructure grows more complex, manual cloud management becomes increasingly impractical.

Modern enterprise environments can involve thousands of containers, distributed datasets and multiple compute clusters running across different cloud platforms.

To manage this complexity, organizations are beginning to rely on intelligent orchestration platforms. These systems use machine learning to monitor infrastructure usage, predict compute demand and dynamically allocate resources.

Frameworks like UCUP illustrate the next generation of orchestration — systems capable of coordinating multiple AI agents, monitoring performance and adapting execution strategies in real time. These platforms move beyond simple scheduling into intelligent decision-making layers.

Ironically, artificial intelligence is not only transforming enterprise workloads — it is also becoming the system that manages cloud infrastructure itself.

Over time, this may lead to largely autonomous infrastructure environments where human operators focus more on policy and oversight than direct system management.

The cost reality of enterprise AI

For all the innovation AI promises, the financial implications are impossible to ignore.

Large language models require enormous computational resources. GPU clusters are expensive and often scarce. Training a single model can consume substantial cloud budgets.

This has forced many organizations to rethink their financial approach to cloud computing.

Practices such as FinOps — which focus on managing and optimizing cloud spending — are becoming essential in AI-driven environments.

Teams are experimenting with strategies such as:

  • Model optimization and compression
  • Distributed training architectures
  • Serverless inference models
  • Workload scheduling across cost-efficient regions

In some cases, organizations are even reconsidering hybrid strategies that bring certain AI workloads back on-premises when economics favors private infrastructure.

AI innovation, it turns out, requires as much financial architecture as technical architecture.

FinOps teams are increasingly collaborating directly with data scientists and ML engineers, creating a new cross-functional discipline focused on balancing performance with cost efficiency.

The emergence of the AI-native enterprise cloud

Perhaps the most significant shift underway is conceptual.

For more than a decade, the cloud served primarily as infrastructure for hosting applications.

But AI is transforming the cloud into something far more powerful.

It is becoming a platform for machine intelligence.

Instead of simply running software, cloud environments are now supporting systems that learn from data, generate insights and automate decisions.

Forward-looking organizations are beginning to design their infrastructure with this reality in mind.

They are not just migrating workloads.

They are building AI-native cloud ecosystems designed to support data-driven intelligence at scale.

This also means embedding AI considerations into every layer of architecture — from data ingestion and storage to security, compliance and user experience.

The next chapter of enterprise cloud architecture

The first wave of cloud transformation focused on modernization.

The next wave is about enabling intelligent systems that augment human decision-making, automate operations and unlock entirely new digital capabilities.

That shift is forcing enterprise architects to rethink the foundations of cloud infrastructure — from compute architecture and data pipelines to orchestration and governance.

The organizations that adapt fastest will not simply run AI workloads in the cloud.

They will build cloud environments designed specifically for intelligence.

And in the process, they will define what the next generation of enterprise infrastructure looks like.

Those that fail to adapt, however, risk being constrained by legacy architectural assumptions that no longer align with the demands of AI-driven innovation.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

Las empresas se están replanteando Kubernetes

Durante años, Kubernetes ha ocupado un lugar casi mítico en las TI corporativas. Se ha posicionado como el plano de control del futuro, la abstracción estándar para los sistemas nativos de la nube y la plataforma que finalmente liberaría a las empresas del bloqueo de infraestructura. Para ser justos, algo de eso es cierto. Kubernetes ha aportado disciplina a la orquestación de contenedores, permitido modelos de implementación portátiles y proporcionado a los arquitectos un potente marco para gestionar aplicaciones distribuidas a gran escala.

Sin embargo, el mercado está cambiando, y también lo hacen las expectativas de las empresas. La cuestión ya no es si Kubernetes es técnicamente impresionante. Claramente lo es. La cuestión es si sigue representando la mejor opción para un número creciente de casos de uso empresariales convencionales. En muchos casos, la respuesta es cada vez más “no”. Lo que estamos viendo no es la muerte de Kubernetes, sino el fin de su dominio incuestionable como opción estratégica por defecto. He aquí el porqué.

Demasiado caro desde el punto de vista operativo

A medida que iba creciendo la adopción de Kubernetes, muchas organizaciones dudaban en admitir que introduce complejidad operativa y requiere habilidades especializadas, ajustes constantes y una sólida gobernanza. Para gestionar bien Kubernetes se necesita ingeniería madura, observabilidad, seguridad, redes y gestión del ciclo de vida: mucho más que un proyecto secundario. Muchos han subestimado esta carga.

Lo que parecía elegante en los diagramas arquitectónicos se ha convertido en una carga real para los equipos de operaciones. Los clústeres se han multiplicado. Las cadenas de herramientas se han extendido. Las actualizaciones se han vuelto arriesgadas. La aplicación de políticas se ha convertido en una disciplina de ingeniería por derecho propio. Las empresas se han dado cuenta de que no solo han estado adoptando una plataforma de orquestación, sino que han estado construyendo y manteniendo un producto interno que requiere una inversión sostenida y conocimientos especializados escasos.

Eso puede ser aceptable para las empresas nativas digitales cuya escala y complejidad justifican el esfuerzo. Es mucho más difícil de vender a las empresas que quieren implementaciones fiables, aplicaciones resilientes y costes de nube razonables. En esos casos, Kubernetes puede parecer un sobredimensionamiento disfrazado de modernización estratégica. Cuando una empresa dedica más tiempo a gestionar la plataforma que a aportar valor empresarial sobre ella, la novedad se desvanece rápidamente.

La portabilidad pierde importancia

Kubernetes se ha comercializado como una protección contra el bloqueo tecnológico, permitiendo que las aplicaciones se ejecutaran en entornos locales, en la nube y en el perímetro. Sin embargo, la mayoría de las empresas se enfrentaban a dependencias del ecosistema —almacenamiento, redes, seguridad, identidad, observabilidad, CI/CD, servicios gestionados y bases de datos nativas de la nube— que han creado un bloqueo práctico que Kubernetes no ha eliminado.

Lo que las empresas han ganado en portabilidad de cargas de trabajo, a menudo lo han perdido en complejidad del ecosistema. Se han estandarizado en Kubernetes sin dejar de depender en gran medida de los servicios gestionados y las convenciones operativas de un proveedor de nube concreto. El resultado ha sido un extraño término medio: toda la complejidad de una plataforma altamente abstraída sin la simplicidad total que supone utilizar servicios nativos con una visión definida de extremo a extremo.

Esto es más importante ahora porque los consejos de administración y los equipos ejecutivos están menos interesados en la opcionalidad arquitectónica teórica y más centrados en resultados empresariales medibles. Quieren velocidad, resiliencia, control de costes y menor riesgo. Si una plataforma de aplicaciones gestionada, un entorno sin servidor o una oferta de plataforma como servicio específica de un proveedor les permite alcanzar esos objetivos más rápido, muchos están dispuestos a aceptar cierto nivel de dependencia. Las empresas se están volviendo más sinceras sobre las compensaciones. Se están dando cuenta de que la flexibilidad estratégica es valiosa, pero no a cualquier precio.

Aquí es donde Kubernetes empieza a perder popularidad. La portabilidad tiene valor, pero para muchas empresas no ha justificado la carga operativa y organizativa que conlleva. La promesa ha superado el rendimiento real.

Las mejores abstracciones están ganando terreno

Quizás el cambio más importante es que las empresas están dejando de comprar primitivas técnicas en bruto para pasar a consumir plataformas de más alto nivel que se alinean mejor con la productividad de los desarrolladores y los resultados empresariales. Los equipos de ingeniería de plataformas ocultan cada vez más Kubernetes tras plataformas internas para desarrolladores. Los proveedores de nube pública siguen mejorando los servicios de contenedores gestionados, las ofertas sin servidor y los entornos de aplicaciones integrados que reducen la gestión manual de la infraestructura. Los desarrolladores, por su parte, no quieren convertirse en operadores de clústeres a tiempo parcial. Quieren vías rápidas para crear, implementar, proteger y supervisar aplicaciones sin tener que unir una docena de componentes.

En otras palabras, Kubernetes puede seguir estando presente bajo el capó, pero cada vez es menos visible y menos central en las decisiones estratégicas de compra. Eso suele ser un signo de madurez. Las tecnologías pasan de ser el titular a ser la infraestructura de base. Las empresas no se preguntan “¿Cómo adoptamos Kubernetes?” con tanta frecuencia como se preguntan “¿Cuál es la forma más rápida, segura y rentable de ofrecer aplicaciones modernas?”. Esa es una pregunta mucho más acertada.

La respuesta apunta cada vez más hacia plataformas curadas, entornos de desarrollo con una visión definida y servicios gestionados que abstraen Kubernetes en lugar de exponerlo. Esto no supone un rechazo de los principios nativos de la nube. Es un rechazo de la carga cognitiva innecesaria. Las empresas están decidiendo que no necesitan controlar cada capa de complejidad para aprovechar las ventajas de la arquitectura moderna.

Renunciar al protagonismo

Nada de esto significa que Kubernetes esté desapareciendo. Sigue siendo importante para entornos a gran escala, heterogéneos y altamente personalizados. Sigue siendo una opción excelente para organizaciones con una gran madurez de plataforma, restricciones normativas o necesidades operativas multicloud sofisticadas. Pero se trata de un segmento del mercado más reducido de lo que sugería en su día el ciclo de hype.

Lo que está perdiendo popularidad no es Kubernetes como tecnología, sino Kubernetes como estándar incuestionable para las empresas. Esta diferencia es importante. Las empresas se están volviendo más selectivas a la hora de decidir dónde aceptar la complejidad y dónde evitarla. Se inclinan menos por idealizar la infraestructura y están más dispuestas a optar por la simplicidad cuando esta existe.

Probablemente eso sea algo positivo. La función de la arquitectura empresarial no es admirar la tecnología elegante por sí misma. Consiste en alinear las decisiones tecnológicas con las realidades operativas, las restricciones económicas y los resultados empresariales. Según ese criterio, Kubernetes sigue teniendo un lugar, pero ya no goza de un pase libre.

Deconstructing the data center: A massive (and massively liberating) project

A few years back, Bhaskar Ramachandran read the tea leaves and what he saw was clear: With all the enhancements hyperscalers continuous make, there was no value in having on-premises data centers any longer.

“There is just no way for a private company to match that,” says Ramachandran, global vice president and CIO of paints and coatings manufacturer PPG. “This is their business, and they’re really good at it, and it was clear that the size of the hyperscalers is just going to win over the infrastructure game. So it didn’t make sense for us to keep up with the infrastructure.”

PPG began dismantling its eight global data centers about four years ago, with the final one completed in November 2025. For a 143-year-old company that has gone through 60-some acquisitions, that was no small feat.

Applications and infrastructure became a lot to manage, combined with trying to maintain a strong cybersecurity posture and compliance. “You can’t consistently manage this sort of a footprint, and it becomes really unwieldy very quickly,” Ramachandran says.

Decommissioning a data center is like defusing a complex bomb. Every wire, sequence, and step must be handled with care, because one wrong move can be a blow to your organization in downtime risks, data breaches, or a hit to its bottom line. 

“The decommissioning of data centers is underestimated in terms of complexity, financial risks, reputation loss, and data exposure,” according to Gartner. The firm estimates that by 2030, twice as many enterprise data centers will have been decommissioned compared to those built. Reasons include consolidations, obsolescence, and shifting workloads to cloud and colocation services.

The inadvertent data center

In some instances, data centers have cropped up without much forethought. “Most organizations I work with didn’t build a data center intentionally — they grew into one,” says Aaron Walker, CEO of IT consultancy Overbyte, and a former associate partner at IBM Consulting. “A rack in a closet became a row in a repurposed room, and suddenly, you have a facility that was never designed for the job holding years of infrastructure decisions.”

Deconstructing that environment is work that often gets overlooked, says Walker.

He recently consulted with a large, fully remote online school in the throes of this process. The deconstruction work began with a full audit of what systems existed. From there every workload was categorized to determine what gets migrated, what gets moved to cloud-native infrastructure, and what gets retired entirely, he says.

Then came the physical side: decommissioning hardware and deciding what equipment had residual value and what to recycle. 

“The timeline pressures are real,” Walker says. “You can’t just power things down. Dependencies surface that nobody documented.”

The IT organizational side had its own challenges. “People have years of institutional knowledge tied to physical systems, and there’s genuine anxiety about dismantling something they built and maintained,” he says. 

Walker’s team also ran into issues trying to upgrade systems during the migration, which is generally a mistake, he says. “A data center deconstruction is already a significant change event, and layering additional upgrades on top of it introduces unnecessary risk. In most cases, it is better to separate modernization from migration.”

From start to finish, the deconstruction ran about a year, but timing will vary from project to project, he says.

Less hassle, more flexibility

When the time came for digital marketing agency Helium SEO to consider what to do with its data center, CTO Paul DeMott says the math was simple. “We were paying $12,000 a month toward the colocation fees, hardware support, and the maintenance cost for the physical servers sitting in racks. Cloud infrastructure promised better reliability, automatic scaling, and way less hassle once we were done moving everything.”

The most compelling reason to rid itself of a physical footprint, though, was flexibility. Physical servers equated to capacity planning six months ahead, DeMott says, and if they needed more resources, IT had to wait weeks for hardware to come and get installed.

“Cloud allows resources to be spun up in minutes and shut down at the same speed,” he says. “We went from buying expensive hardware that depreciated to purchasing what we are actually using.”

IT began by creating a list of all the apps running on physical servers and classifying them according to how difficult it would be to move them. “Simple web apps moved first as they barely needed changes,” DeMott says. “Databases and anything which stores data — that’s a little bit later because we’d have had to plan the migration well.”

Some older apps had to be changed to work on the cloud, he adds. The actual move took place over six months, and IT decommissioned the data center while deploying apps to the cloud in tandem, moving the services step by step with backup plans for each one.

Still, the process wasn’t seamless. “Translating 15TB of data to the cloud takes 72 hours on our internet connection, and that was the biggest problem,” DeMott notes. IT ended up using AWS Snowball, a physical hard drive, because it took staff weeks to upload everything, “and [it] ruined the performance in our network.”

Another issue was figuring out the cloud costs, which DeMott characterizes as “brutal. Different types of servers, storage, data transfer costs made it almost impossible to budget,” he says. “Our first month bill accrued at 40% more than we estimated because we forgot about charges for moving data out of the cloud.”

It took IT three months of “fumbling” to get costs below what the company paid for the data center before things stabilized.

The power of ‘cloud only’

Once PPG made the decision to dismantle its data centers and move everything to the cloud, it was time to spread the word internally. “When you say, ‘cloud only,’ it makes it much easier for you to have conversations,” Ramachandran says. “It just sets the entire organization up on a single mission … just those two words make it very, very clear to everybody in the company what that means. There is no room for interpretation.”

The news was revealed at a global town hall, and initially, Ramachandran says, the sentiment was, “this too, shall pass. Then people decided to get on board.”

There were the typical organizational change management issues to deal with. Building momentum takes time, he says, but once the first data center was shut down, people came to the realization that “Okay, we are actually doing this,” Ramachandran says. “Then there was no resistance … everybody got on board, and things started to accelerate.”

Officials ensured that all the training IT needed was made available to them and the company paid for everything, certifications included. “We recognized it in town halls; anybody that went through this training and got the certification. We celebrated people. We promoted people that did the things we wanted them to do,” he says. All of this helped reinforce the mission.

“For the most part, business users didn’t care; their apps were available and they didn’t care where they were,” although there were a couple of exceptions among more technically savvy employees who were concerned about workflow and the security implications of cloud. There was a perception among some that a data center was more secure, Ramachandran says.

That led to looking at publicly available information on all the cybersecurity incidents in the recent past. The research indicated a clear pattern, he says.

“And the pattern is: The more significant cybersecurity events were actually happening to companies” that were largely on-prem environments, Ramachandran observes. “So you came to this point where the cloud actually became lot more secure than on-prem infrastructure.”

There are several reasons why, he maintains, including that, relatively speaking, it is a lot easier to implement security policies consistently in the cloud because “you have a single pane of glass enforcement of policies that you don’t have in an on-prem environment.”

This makes managing your attack surface area more straightforward, Ramachandran says. “So you put all of this together, you package it up on the presentation, and talk to those people one on one, and then say, ‘This is why.’”

The dismantling process

PPG works with a single hyperscaler for its business in China and three others. Deciding what apps went where was largely a function of the technology and which hyperscaler “lends itself to that brand of technology versus the other.” In some instances, where a decision of which to use wasn’t clear, IT made the call.

Step one was deciding on an approach, and PPG opted to modernize its apps at the same time as the deconstruction work. “When you pull together the business case to modernize applications, we came to a conclusion that if we do modernization on the application layer and the infrastructure layer at the same time, I would probably be retired by the time we migrated the data center,” Ramachandran says.

That made it easy to decide when to do a lift and shift and when to not bother migrating certain applications, he says. Then IT could focus on other business priorities to modernize the workforce.

“We just adjusted our roadmap to say the new [app] would go straight into the cloud” while not bothering to move older workloads, Ramachandran says.

The human element

The next step was “finding the people that are hungry to do something new and probably have a bit of experience and … they are waiting for someone to say, ‘Hey, let’s do this,’” Ramachandran says of the data center deconstruction. “They are forward thinkers. Every organization in our scale has [them]. It’s identifying those people and then … empowering them. They became the leaders in the new infrastructure.”

Once the migration started, it was important to celebrate the wins. That gets more people interested in being a part of the new organization PPG was forming called the Cloud COE (center of excellence).

The biggest mistake companies make is treating deconstruction as a single project instead of a phased operational shift, says Roland Parker, founder and CEO of Impress Computers, a managed IT services and cybersecurity firm in Houston.

“We walked one 200-person manufacturer through moving workloads in priority tiers — production-critical systems last, not first — which kept their floor running while we systematically eliminated physical infrastructure over 14 months,” he says.

However, it’s “the human side [that] kills more timelines than the tech does,” Parker observes. “Field supervisors and plant managers have work-arounds built around how legacy systems behave.” So, before touching a single rack, Parker’s team audits those informal processes, “because if you don’t, you migrate the infrastructure and orphan the people who actually use it.”

Overbyte’s Walker agrees, saying that almost all the snafus his team ran into during the online school deconstruction project were not technical, but came down to visibility. “At some point, you have to confront unknown systems; things with incomplete or outdated documentation,” he says. “We had moments where, after beginning to deprovision systems, stakeholders surfaced saying, ‘Wait, that’s still in use.’”

Dismantling systems is not the end

PPG experienced no disruptions during the dismantling process, Ramachandran says, other than some tactical delays and contracts that needed updating.

“There were some learnings on the network side because networking can get complex,” he says. “Sometimes, we extended the outage windows” to up to five hours, for example. Those were the hiccups.”

From start to finish, the decommissioning process of all eight data centers took about three years. “The end is not migrating all the workloads. The end is actually shutting down the data center,” Ramachandran stresses. This requires deconstructing the power, the cooling, fire systems, and multiple generators used for backup, which had to be removed by helicopter.

“You have to take the diesel fuel out and dispose it off and sell it. We have to get recertification of the building for safety, because this is a building where you had kilowatts of power coming in, which basically [also] went through a deconstruction process,” he says. “So you have to get a safety certification … all of this takes time because we have to give the building back to the building management the way they gave it to us.”

What data center deconstruction buys you

The painstaking data center deconstruction process has given Ramachandran valuable insight. “Make sure your best people spend time creating value for the business, as opposed to babysitting infrastructure,” he says, because infrastructure no longer adds value.

“You also do a lot of inherent risk management by getting rid of data centers and moving to a cloud environment you don’t have to worry about,” he adds. Noting the current state of the economy, Ramachandran says coping with sudden price increases for memory and chips is no longer stressful since they aren’t buying infrastructure.

“You’re basically giving back working capital to the company, because you’re moving the organization from a fixed capital environment to your variable cost model completely,” he says, “and you don’t have to refresh your hardware every four or five years.”

Cost was never the objective for the data center deconstruction, Ramachandran notes. “Nonetheless, when we did the business case, we said it’s not going to cost us any more or any less, but will buy us better security, better flexibility, better agility for the organization,” as well as better focus and technology. “And we achieved all of those.”

The value is in all those other areas. “We are not data center operators. The team is now focused on delivering applications that are meaningful to the business,” Ramachandran says. “The team is much closer than ever to the business because we are not talking infrastructure but how to make the business better.”

Walker says companies should measure twice, cut once. “Most teams want to jump straight into migration,” he says, “but the real work is building a complete inventory and mapping dependencies upfront.”

While it made sense for PPG to modernize some apps at the same time as the data center deconstruction work, Walker advises IT leaders to resist the urge to do everything at once. “Focus on moving what you understand first, and isolate the unknowns early,” he says.
“The success of these projects is usually determined by how well you handle the edge cases, not the easy wins.”

Any new technological development IT can make without interrupting operations dramatically reduces time to market, Ramachandran says.

Working on the latest technologies makes IT happy, and that helps with talent retention, he adds, “because we can say we’re cloud only, so this 143-year-old company looks modern. That is meaningful in so many ways.”

칼럼 | AI 거품론 속 진짜 승부수, 엔비디아의 장기 전략

AI는 오늘날 업무 방식과 삶의 방식, 나아가 네트워크 트래픽까지 변화시킬 것이라는 기대를 받고 있다. 흥미롭고 심지어 흥분되는 흐름이지만, 만약 이 모든 것이 과장된 기대에 불과하다면 어떻게 될까. 월가는 AI에 대한 각종 주장에 점점 더 불안감을 드러내고 있으며, 엔비디아는 AI 산업의 중심에 있는 동시에 이러한 우려의 한복판에 서 있다. 현재의 AI 모델이 엔비디아와 같은 AI 기업의 재무 성과까지 포함해 모든 것을 바꿀 수 있을지에 대해서는 여전히 불확실성이 존재한다. 이는 분명한 리스크이며, 리스크에 직면했을 때 필요한 것은 ‘보험’이다. 다만 중요한 것은 적절한 형태의 보험을 선택하는 일이다.

현재 AI 투자 대부분은 클라우드 기반 기술에 집중돼 있다. 실제로 지출의 주체가 클라우드 사업자라는 점을 고려하면 자연스러운 흐름이다. 그러나 많은 투자자들은 이러한 접근 방식 자체가 일종의 ‘하이프’에 불과하다고 보고 있다. 엔비디아는 기존 모델을 공개적으로 부정하지는 않는다. 여전히 주요 수익원이기 때문이다. 대신, 시장의 기대가 붕괴되는 상황에 대비해 다양한 AI 접근 방식을 적극적으로 모색하고 있다. 일부는 작은 시도에 불과하지만, 일부는 대규모 프로젝트로 확장되고 있으며, 이들 모두가 중요한 의미를 갖는다.

엔비디아의 ‘AI 보험 전략’을 보여주는 대표적인 사례는 GPU 드라이버 업데이트 과정에서 확인할 수 있다. 엔비디아는 ‘ChatRTX’를 통해 RTX 30 시리즈 이상 GPU에서 오픈소스 기반 챗봇 LLM을 직접 실행할 수 있도록 지원하고 있다. 사용자는 자신의 데이터와 연동하면서도 완전한 데이터 주권을 유지할 수 있다. 이미 이전부터 존재하던 기능이지만, 최근 들어 이를 적극적으로 부각하고 있다. 이는 대형 클라우드 기반 챗봇에 대한 대안이자, 온프레미스·자체 호스팅 AI의 가능성을 보여주는 사례로 평가된다.

실시간 컴퓨팅 분야도 중요한 축으로 부상하고 있다. 엔비디아는 자사의 AI 도구를 활용해 디지털 트윈을 구축하는 기술을 꾸준히 발전시켜 왔으며, 이를 ‘월드 모델’로 정의하고 있다. 이 기술은 물리적 시스템을 모델링하고, 애플리케이션이 실제 환경의 실시간 프로세스를 제어할 수 있도록 지원한다. 지난 3월에는 기계 팔, 차량, 휴머노이드 로봇 등 다양한 형태의 로봇과 AI, 월드 모델을 결합하는 대규모 이니셔티브를 발표했다.

이러한 최신 전략에서 핵심은 ‘인지(perception)’ 기술이다. 이는 센서와 영상 데이터를 통해 현실 세계의 상태를 분석하고, 이를 제어 대상 프로세스를 반영하는 월드 모델에 반영하는 능력을 의미한다. 월드 모델을 현실과 지속적으로 동기화하는 것은 자율 시스템에서 가장 중요한 요소로 꼽힌다. 시스템 간 충돌이나 시설, 사람과의 사고를 방지하고, 의도한 대로 정확히 작동하도록 만드는 핵심 기반이기 때문이다.

이는 막대한 수익 기회를 창출할 수 있는 영역이다. 향후 시장 규모는 지난 70년간 IT 투자를 정당화해온 모든 비즈니스 사례를 합친 수준에 이를 수 있다는 평가도 나온다. 다만 기술 성숙까지는 시간이 필요하다. 현재 자율 시스템 기반 월드 모델을 실제 프로젝트에 활용 중인 기업은 20% 미만에 불과하다. 업계에서는 엔비디아를 비롯한 주요 AI 기업들이 이 시장을 2028년 전후의 핵심 성장 동력으로 보고 있는 것으로 분석한다.

2028년 전망이 다소 먼 이야기처럼 느껴진다면, 그보다 더 장기적인 기술이 있다. 바로 양자 컴퓨팅이다. 이미 일부 시스템은 존재하지만, 기업 환경에서 실제로 활용되는 사례는 사실상 없다. 최근에는 양자 컴퓨팅이 과거 기대와 달리 무한 확장이 어렵다는 분석도 제기됐다. 그럼에도 불구하고 잠재력은 여전히 막대하다. 이론적으로는 적절한 서버 수준의 장비만으로도 슈퍼컴퓨터에 맞먹는 성능을 구현할 수 있지만, 그 시점이 언제가 될지는 아직 불확실하다. 이에 엔비디아는 시장 성숙을 기다리기보다 선제 대응에 나서고 있다.

현재 양자 애플리케이션을 검증하는 유일한 방법은 시뮬레이션이며, GPU는 이를 위한 최적의 플랫폼으로 평가된다. 엔비디아는 CUDA-Q 플랫폼과 cuQuantum 라이브러리를 통해 시뮬레이션을 지원하고, NVQlink와 DGX Quantum을 통해 양자 시스템과 GPU 서버를 저지연으로 연결하는 구조를 구축하고 있다. 또한 ‘엔비디아 가속 양자 컴퓨팅 연구센터’를 통해 연구기관과 기업 간 협력 생태계 조성에도 나섰다. 이 모든 기술은 엔비디아의 양자 클라우드를 통해 초기 접근 형태로 제공되고 있다.

실시간 컴퓨팅과 양자 컴퓨팅이 주목을 받고 있지만, 전략적으로 가장 중요한 요소는 개인용 챗봇과 같은 소형 AI 솔루션이라는 분석도 나온다. AI 월드 모델과 양자 컴퓨팅이 장기적으로 효과를 발휘하는 ‘미래형 보험’이라면, 개인용 챗봇은 현재와 미래를 연결하는 현실적인 대안이다. 즉, 단순한 기대감이 아닌 실질적인 ‘AI 보험’ 역할을 수행할 수 있다는 의미다.

엔비디아 투자자가 아니라면 이러한 흐름이 왜 중요한지 의문이 들 수 있다. 그러나 핵심은 AI 열풍 이면에서 실제 비즈니스 가치를 만들어내는 기반이 구축되고 있다는 점이다. 현재도 기업 환경에서 활용 가능한 실질적 가치가 일부 입증되고 있지만, 본격적인 성과가 나타나기까지는 상당한 시간이 필요하다. 이 과정에서 AI에 대한 기대감은 시장의 관심을 유지하는 역할을 한다는 것이 업계의 시각이다.

결국 과제는 명확하다. 시장의 관심이 다른 곳에 쏠린 상황에서, 상대적으로 주목받지 못하는 AI 비즈니스 영역에 대한 관심을 끌어낼 수 있느냐다. 엔비디아가 이를 성공적으로 수행할 수 있을지 여부는 향후 AI 전환의 속도와 범위를 결정짓는 핵심 변수로 작용할 전망이다. ‘보험’도 중요하지만, 궁극적으로는 지속 가능한 성장 기반을 확보하는 것이 더 중요하다는 점을 시사한다.
dl-ciokorea@foundryco.com

AWS cost drift: The operational cause nobody talks about

>For many enterprises, cloud cost optimization has become a persistent challenge. Despite investments in FinOps tools, reserved instances, and cost monitoring platforms, AWS spend often continues to rise in ways that are difficult to predict or control. This phenomenon, often described as “cost drift,” is typically attributed to pricing models or workload growth.

However, the real cause is often less visible and more systemic: operational behavior.

As organizations scale in AWS, complexity increases across environments, services, and teams. Over time, this complexity introduces inefficiencies that are not always captured by traditional cost management approaches. Idle resources, overprovisioned infrastructure, fragmented ownership, and inconsistent governance all contribute to gradual cost increases that accumulate over time.

These issues are rarely the result of a single decision. Instead, they emerge from day-to-day operational patterns. Environments evolve, workloads expand, and teams make incremental changes, each of which may seem justified in isolation. But without consistent operational discipline, these changes compound into structural inefficiencies that drive cost drift.

A key challenge is that many organizations still manage cloud environments using reactive operating models. Teams respond to incidents, performance issues, or new requirements as they arise, often prioritizing speed over optimization. While this approach may support short-term agility, it can lead to long-term inefficiencies as resources are added but rarely removed or rightsized.

This dynamic is becoming more pronounced as cloud environments grow more complex. In fact, as explored in “Why running AI is now harder than building it”, many organizations are discovering that operationalizing modern workloads introduces new layers of cost and complexity that traditional models were not designed to manage.

Another contributing factor is the disconnect between cost visibility and operational accountability. While finance and FinOps teams may have detailed insights into spend, the decisions that drive that spend are often made across distributed engineering teams. Without clear ownership and alignment, it becomes difficult to enforce consistent cost controls or identify the root causes of inefficiency.

Automation, or the lack of it, also plays a critical role. Manual processes for provisioning, scaling, and incident response can introduce variability and delay optimization efforts. In contrast, environments that leverage automation and AI-driven operations are better positioned to continuously monitor usage, identify anomalies, and take corrective action in real time.

This shift toward more intelligent operations is increasingly important. Organizations that embed automation and operational intelligence into their cloud environments can move from reactive cost management to proactive optimization. This not only helps control spend but also improves performance, resilience, and overall operational efficiency.

Importantly, addressing cost drift requires more than financial oversight. It requires rethinking how cloud environments are operated.

Leading organizations are adopting more structured operating models that integrate cost management into daily workflows. This includes establishing clear ownership of resources, enforcing governance policies, and embedding cost considerations into engineering decisions from the outset. Rather than treating cost optimization as a periodic exercise, it becomes a continuous discipline.

These organizations also focus on improving visibility across environments. By correlating operational data with cost data, they can better understand how specific actions or inefficiencies impact overall spend. This enables more targeted optimization efforts and helps prevent cost drift before it becomes significant.

The impact of this approach can be substantial. In environments where operational inefficiency is a major contributor to spend, organizations can achieve meaningful cost reductions while also improving recovery performance and governance consistency.

For CIOs and technology leaders, the takeaway is clear. AWS cost drift is not just a financial issue. It is an operational one. Without addressing the underlying behaviors and processes that drive inefficiency, even the most advanced cost management tools will have limited impact.

The path forward lies in evolving from reactive cloud operations to more proactive, intelligence-driven models. By embedding automation, improving accountability, and aligning teams around shared operational and financial goals, organizations can bring greater predictability to cloud spend while supporting continued innovation.

As cloud environments continue to scale and as new workloads, including AI, place additional demands on infrastructure, the ability to operate efficiently will become a key differentiator. Organizations that address the root causes of cost drift will be better positioned to optimize performance, control costs, and maximize the value of their cloud investments.

To learn more about handling AWS cost drift, check out the Run AWS at Scale e-book.

>

Why SaaS companies must become octopuses to survive AI

Sixty-six million years ago, an asteroid wiped out 75% of species on Earth. Octopuses survived, however, because of their ability to radically adapt their biology in hours, not eons. Today, SaaS companies face their own asteroid: AI. And the octopus points the way to survival.

We see the signs in the firms that are thriving today. When Upwork Senior Vice President Dave Bottoms rebuilt the company’s AI stack, he made a counterintuitive choice. Rather than optimizing for today’s best model, he architected for disposability.

“What we think is the best model today may not be the best model tomorrow,” Bottoms explains. His team built an “optionality layer” that lets them swap AI models like changing batteries.

As Bottoms recognized, AI is evolving faster than SaaS architectural cycles, and rigid AI implementations are becoming legacy systems the moment they ship.

Design for your customers’ jobs, not your technology

The fatal mistake many SaaS companies make with AI is starting with what the technology can do rather than what customers need it to do. Aarthi Ramamurthy, chief product officer at CommerceHub (now Rithum), begins differently. “I start with empathy,” she says. “I know what the retail ecosystem goes through and all the complexity in it.”

That empathy led CommerceHub to focus AI on a deceptively simple problem: supplier onboarding. When one company calls a sweater pink and another calls it fuchsia, manual matching creates friction.

But Ramamurthy’s team didn’t just throw AI at the problem. They mapped the specific “Jobs to Be Done”—such as getting suppliers connected faster with fewer errors—and then selected the right AI approach, starting with simple algorithmic matching before layering in machine learning for demand prediction.

Contrast this with an all-too-typical approach: We’ve got LLMs, let’s find somewhere to use them. Sushma Kittali-Weidner, former chief product officer at Rheaply, frequently sees this mistake. She explains: “People are looking for magic but not thinking enough about how AI can create efficiencies in existing processes.”

Enable octopus organizations

The most valuable thing SaaS companies can do with AI is enabled by the technology, but much bigger: Helping customers distribute intelligence and authority throughout their organizations. Allow them to be like the octopus, which has two-thirds of its neural tissue outside its central brain. Octopus organizations move faster and more responsively because decisions get made closer to the frontline.

SaaS company Movable Ink’s Da Vinci platform demonstrates this enablement. CEO Vivek Sharma built a system to mass-send hyper-tailored emails by combining vision models, generative AI, insight engines and prediction algorithms.

The platform pushes sophisticated personalization decisions to frontline marketers who previously needed executive approval for campaign variations. The system determines what stories are delivered to customers, which imagery and creative are used, and when, how often and where to deliver them.

This is authority devolution at scale. Each marketer becomes vastly more capable, teaming with AI to make thousands of micro-decisions that would have been impossible under traditional hierarchies. Movable Ink’s customers can now generate hundreds of thousands of email variations where they once created one.

Within one of the world’s largest commerce networks, CommerceHub’s 2.4 billion daily transactions create similar distributed intelligence, pushing supplier matching and inventory decisions to procurement teams. CommerceHub’s AI mines poorly structured data and surfaces patterns that enable frontline employees to act without escalation.

In short, winning SaaS products help customers become octopus organizations—distributed, adaptive and intelligent at every edge.

Break your own silos

The uncomfortable truth is that most SaaS companies can’t help customers become octopuses because they’re not octopuses themselves.

The octopus has a “neural necklace,” a ring of nerve bundles that connects all its arms, enabling instant information sharing among them without involving the central brain. But SaaS companies frequently have broken connections. Just look at customer success and product teams.

Customer success teams hear about where products fail, where workflows create friction and where latent needs go unmet. Product teams have usage telemetry and performance data. When this information flows freely between teams, you create extraordinary sensing capability. But typically, these teams have separate reporting lines. They exchange sanitized summaries while critical signals vanish.

CommerceHub’s Ramamurthy addressed this by starting AI deployment on internal insights dashboards before adding it to external features. This created shared understanding across functions. When customer success, product and engineering teams access the same AI-generated insights about customer behavior, they develop a common language and aligned priorities.

Build for continuous transformation

The octopus can reconfigure its RNA in hours, adjusting biological processes faster than evolution allows. This is how it’s survived for 300 million years without external defenses. SaaS companies need to adapt at a similar speed because AI capabilities shift weekly.

Kittali-Weidner experienced this. Her team at Rheaply was resource-constrained and couldn’t afford over-engineering, so they designed modular AI implementations that could evolve without massive refactoring. The research and prototyping process that once took weeks now happens in real-time co-creation sessions. That’s a true competitive advantage.

On-demand adaptation demands a new team composition. You need engineers who embrace disposable code, product managers who ship features knowing they’ll be replaced in months, and executives willing to deprecate yesterday’s breakthroughs for tomorrow’s improvements.

Design for the 80/20 rule

SaaS companies stumble by automating too little, leaving AI as a novelty or automating too much, triggering resistance. At Upwork, Bottoms has learned that “80% of the work can be automated, but the last 20% still requires human judgment.” Upwork’s AI, for example, generates job posts and proposals, but humans make the hiring decisions.

Similarly, Movable Ink succeeds by making AI suggestions initially optional and editable. Users see value while maintaining control. Only after establishing trust does the system shift toward AI-as-default.

Adapt for the future

The octopus teaches us that survival belongs to the adaptable.

Externally, your product must help customers become octopus organizations: Distributing intelligence, devolving authority and adapting rapidly. Internally, you must become an octopus yourself: Connecting information across silos, building for continuous transformation and balancing autonomy with coordination.

The AI asteroid is already here. Become an octopus and thrive.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

What Google’s “unified stack” pitch at Cloud Next ‘26 really means for CIOs

Google didn’t so much announce products at Cloud Next ’26 as it tried to reframe the real bottleneck to scaling AI as the architecture that CIOs have been building while trying to piece it together.

For years, enterprises have treated AI like a kit, with models, infrastructure, and data spread across different vendors and heterogenous environments, an approach that worked well enough in pilot mode, but has proven harder to scale into something dependable.

That, at least, is the problem that Google Cloud CEO Thomas Kurian chose to name, and own, on stage. “You have moved beyond the pilot. The experimental phase is behind us,” he said, before posing the more uncomfortable question for CIOs: “How do you move AI into production across your entire enterprise?”

His answer: “A unified stack.”

What that “unified stack” amounts to in practice, though, is Google stitching together layers it has historically sold and marketed separately into an architecture that represents a single operating fabric for enterprise AI.  

Kurian cast it as the “connective tissue” binding what are typically siloed layers, such as custom silicon, models, data, applications, and security, into a single, coordinated system. That translates into workload-specific TPUs to run and scale AI, Gemini Enterprise and the Gemini Enterprise Agent Platform to build and embed agents into business workflows, the Agentic Data Cloud to ground them in enterprise context, and a parallel push to secure both agents and the infrastructure they run on.

A turnkey answer to integration fatigue?

It’s a neat and a timely argument for enterprises, said independent consultant David Linthicum, especially for those that are frustrated with stalled pilots as a result of fragmented AI stacks,.

In addition, noted Ashish Chaturvedi, leader of executive research at HFS Research, most CIOs are drowning in integration tax, which compounds the costs of scaling an AI initiative. “The average enterprise has spent the last two years stitching together models from one vendor, orchestration from another, data pipelines from a third, and governance as an afterthought,”  Chaturvedi said. “Google, in contrast, is pitching a turnkey solution.”

That turnkey solution, said Shelly DeMotte Kramer, principal analyst at Kramer & Company, could be attractive on a number of fronts if CIOs are building on Google Cloud. It could reduce integration risk, offer faster pilot-to-production trajectories, and democratize AI across the organization and beyond IT via the Workspace Studio no-code agent builder

Concerns around execution and clarity

However, Kramer is not confident about Google’s execution of its unified stack vision. “Google Cloud has consistently come in in third place in terms of enterprise cloud share, with what could, in all candor, be called thinner organizational muscle for large-scale professional services engagements than what you might expect from AWS and Microsoft,” he said.

HyperFRAME Research’s leader of the AI stack Stephanie Walter, also has doubts. She questioned the clarity of the offerings that Google is packaging and marketing as part of that vision.

“While the pitch will resonate with enterprises tired of stitching together products to scale AI, it lacks clarity,” she said. “Google announced a lot at once, and the way the AI product portfolio fits together is still somewhat unclear, so CIOs will like the ambition while still asking for a cleaner map of where Gemini Enterprise, the Agent Platform, the Application, and the data layer begin and end.”

Converging vendor visions add complexity

That ambiguity, analysts say, will be further deepened for CIOs as they try to evaluate Google’s pitch against converging visions from rivals AWS and Microsoft, who, since last year, have been promoting their own visions of moving AI pilots into production.

While the convergence in vendor pitches will simplify choices at a high level, it will add complexity in practice because the control planes, pricing, ecosystem depth, and interoperability across offerings vary meaningfully, Linthicum said.

“CIOs still have to map those differences to their existing estate, talent base, and governance model. Similar narratives do not mean equivalent operating realities,” he added.

That, according to Walter, risks leaving CIOs comparing architectures that sound strikingly alike on paper, even as their underlying trade-offs remain difficult to parse at an operational level.

The convergence in vendor pitches could also backfire on Google, Chaturvedi noted. “The more similar the top-line narratives become, the more the decision swings on non-technical factors such as existing relationships, migration costs, and trust,” he said.

If anything, that dynamic may push enterprises toward a more pragmatic split. Paul Chada, co-founder of agentic AI startup Doozer AI, expects CIOs to end up standardizing on two distinct layers when scaling AI: a primary agent control plane aligned with where enterprise applications and user workflows reside, and a separate data reasoning layer anchored in governed data environments.

“The dream of a single vendor owning both likely won’t survive procurement,” he said.

“Unified” could still mean complex pricing

Further, analysts pointed out that Google’s unified stack pitch could introduce concerns for CIOs that go beyond architectural clarity.

For example, Linthicum noted that bundling infrastructure, models, data services, and agents into a single narrative doesn’t necessarily simplify costs, rather it makes pricing harder to predict and optimize,.

“A unified product story can still produce a highly fragmented bill. CIOs should expect more pricing complexity,” he said.

And Mike Leone, principal analyst at Moor Insights and Strategy, added that the problem of pricing complexity around AI offerings, doesn’t change with CIOs switching vendors. “Every hyperscaler is walking in the same direction,” he said.

That, said Dion Hinchcliffe, lead of the CIO practice at The Futurum Group, leaves CIOs with fewer levers to simplify costs at the vendor level and more responsibility to manage them internally. To that extent, he added that enterprises will need to lean more heavily on FinOps disciplines to regain control over increasingly complex and opaque AI spending.

Different strengths

There is, however, a more nuanced upside for CIOs willing to look past the unified vision pitch.

Kramer, for one, pointed to Google’s control over its own AI silicon as a potential differentiator. “That makes the comparatively better performance-per-dollar pitch for AI workloads at the infra level somewhat defensible,” he said.

At the same time, the analysts agreed, the competitive field, at least for CIOs, is far from settled.

“Microsoft looks best positioned on enterprise distribution and workflow adjacency. AWS is strongest on operational breadth, developer familiarity, and cloud maturity. Google is strongest where AI infrastructure, analytics, and model-platform integration matter most,” Linthicum said.

CIOs, in turn, should align  vendor strengths with enterprise priorities, whether that’s driving user adoption, scaling operations, or deepening AI and data platform capabilities, he added.

❌