Visualização normal

Antes de ontemStream principal
  • ✇Security | CIO
  • When AI writes code, it joins the software supply chain
    AI tools designed to assist developers are no longer staying in the background. They are starting to shape what actually gets built and deployed. They open pull requests. They modify dependencies. They generate infrastructure templates. They interact directly with repositories and CI/CD pipelines. At some point, this stops being assistance. It becomes participation. And participation changes the problem. When assistance becomes participation The
     

When AI writes code, it joins the software supply chain

7 de Maio de 2026, 07:00

AI tools designed to assist developers are no longer staying in the background. They are starting to shape what actually gets built and deployed.

They open pull requests.

They modify dependencies.

They generate infrastructure templates.

They interact directly with repositories and CI/CD pipelines.

At some point, this stops being assistance.

It becomes participation.

And participation changes the problem.

When assistance becomes participation

The shift from generative to agentic behavior is the inflection point.

Earlier tools operated inside a tight loop. A developer prompted. The system suggested. The developer reviewed. Nothing moved without human intent.

That boundary is eroding.

Newer systems propose changes, update libraries, remediate vulnerabilities and interact with development pipelines with limited human intervention. They don’t just accelerate developers. They begin to shape the artifacts that move through the software supply chain — code, dependencies, configurations and infrastructure definitions.

That makes them something different.

Not tools.

Participants.

And once something participates in the supply chain, it inherits the same question every other participant does:

How is it governed?

A simple scenario

Consider a common pattern already emerging in many environments.

An AI system identifies a vulnerable dependency.

It opens a pull request updating the library.

A workflow triggers automated tests.

The change is promoted into a staging environment.

Four steps.

No human review.

No explicit governance checkpoint.

Each step is individually valid. Nothing looks wrong in isolation.

But taken together, they create something fundamentally different: A system that can change enterprise software without human intent being re-established at any point. Research from Black Duck found that while 95% of organizations now use AI in their development process, only 24% properly evaluate AI-generated code for security and quality risks.

This is autonomous change propagation across the software supply chain.

The “human-in-the-loop” fallacy

Many organizations rely on a “human-in-the-loop” (HITL) requirement as a safety mechanism for AI-generated code.

At low volumes, this works.

At scale, it breaks.

When an AI system generates dozens of pull requests in a short window, review becomes a throughput problem, not a control. The cognitive load of validating machine-generated logic exceeds what a human can realistically govern.

What remains is not oversight, but a checkpoint.

And checkpoints without effective review are not controls.

The governance gap

Most governance models assume a stable truth: Humans are the primary actors.

Controls tied identity to individuals, approvals to intent and audit trails to accountability.

Even automation systems are treated as extensions of human intent — predictable, bounded and deterministic.

AI systems break that model.

They can generate new logic, act on it and propagate changes across systems. Yet in most environments, they are still governed as if they were static tools.

That mismatch is the gap.

Machine identity is no longer what it was

One way to see this clearly is through identity.

Every interaction an AI system has — repository access, pipeline execution, API calls — requires credentials. In practice, these systems operate as machine identities.

But they are not traditional machine identities.

A service account executes predefined logic. Its behavior is known in advance. Its risk is bounded by what it was configured to do.

An AI-driven system is different. It generates the logic it then executes.

It can propose new code paths, interact with new systems and trigger actions that were not explicitly predefined at the time access was granted.

That is a category change.

Not just a new identity type, but a new attack surface: Identities that can generate the behavior they are authorized to execute.

The World Economic Forum has identified this class of non-human identity as one of the fastest-growing and least-governed security risks in enterprise AI adoption.

Measuring exposure before solving it

Most organizations already track access-related metrics. Those metrics were designed for human-driven systems.

They are no longer sufficient.

If AI systems are participating in the software supply chain, organizations need to measure where and how that participation introduces risk.

A few signals matter immediately:

  • AI-generated artifact footprint: What portion of code, dependencies or infrastructure definitions in production originates from AI-assisted processes?

  • Authority scope of AI systems: What systems can these identities access — and what actions can they take across repositories and pipelines?

  • Autonomous change rate: How often are changes introduced and propagated without explicit human review?

  • Cross-system interaction surface: How many systems does a single AI workflow touch as part of normal operation?

  • Auditability of AI-driven actions: Can changes be traced cleanly to a system, workflow and triggering context?

These are not abstract concerns. They are measurable.

And until they are measured, they are not governed.

The regulatory imperative

This is not just a technical shift. It is a governance and liability shift.

As regulatory expectations evolve — from AI accountability frameworks to cybersecurity disclosure requirements — organizations are increasingly responsible for explaining and controlling automated decisions inside their environments.

If an AI-driven change introduces a vulnerability or leads to a material incident, “the system generated it” will not be an acceptable answer.

Accountability will still sit with the enterprise.

That raises the bar: Governance must extend to how autonomous systems act, not just how they are accessed.

The architecture gap

Diagram of the AI architecture governance gap
AI systems operate horizontally across systems, while governance remains vertical

Puneet Bhatnagar

The issue is not that any one control is missing.

It is that AI systems operate across the seams of systems designed to govern within their own boundaries.

Repositories enforce code controls.

Pipelines enforce deployment controls.

Identity systems enforce access controls.

Security tools enforce policy checks.

Each works as designed.

But AI systems move across all of them.

They read from one system, generate changes, trigger another and influence a third. Authority is exercised across systems, while governance remains within them.

That is the architectural gap.

A different governance model

Most organizations will respond to this shift by trying to extend existing access controls. That instinct is understandable — and insufficient.

The problem is no longer just who or what can access a system. It is how control is maintained when authority can generate new actions dynamically.

This requires a different model of governance.

One that treats software systems as actors whose behavior must be bounded, observed and continuously evaluated across workflows — not just permitted or denied at a point of access. Governance becomes less about static permissions and more about controlling the shape and impact of actions across systems.

That is the shift.

Conclusion

The conversation around AI in software development often focuses on productivity.

But as AI systems begin to participate in producing and modifying enterprise software, the more important question becomes governance.

AI is not just accelerating the software development lifecycle. It is becoming part of the software supply chain itself.

And that changes the problem.

The challenge for CIOs is no longer just managing developers, tools or pipelines. It is understanding and governing the authority that software systems exercise across them.

Because in a world where software can act on behalf of the enterprise, governance is no longer just about access.

It is about authority — what systems are allowed to do, and how that authority is controlled and measured over time.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

  • ✇Security | CIO
  • I gave our developers an AI coding assistant. The security team nearly mutinied
    I’ve sat in enough risk meetings to know the sound a bad surprise makes before anyone names it. It usually starts with a pause. Then a throat gets cleared. Then someone says, “We may need to bring the CISO into this.” That happened over a developer tool. Not a breach. Not a regulator. Not ransomware at 2:00 a.m. A coding assistant. At first, I thought the reaction was overcooked. I’d seen the same pattern in other boardrooms and delivery teams. A new tool appears.
     

I gave our developers an AI coding assistant. The security team nearly mutinied

6 de Maio de 2026, 09:00

I’ve sat in enough risk meetings to know the sound a bad surprise makes before anyone names it. It usually starts with a pause. Then a throat gets cleared. Then someone says, “We may need to bring the CISO into this.”

That happened over a developer tool.

Not a breach. Not a regulator. Not ransomware at 2:00 a.m. A coding assistant.

At first, I thought the reaction was overcooked. I’d seen the same pattern in other boardrooms and delivery teams. A new tool appears. Engineers like it because it saves time. Leadership likes it because it promises more output without hiring half a city. Security hates it because security has the social burden of being the adult in the room when everyone else is buying fireworks.

I backed the rollout because the case was clean on paper. Developers were drowning in repetitive work. Deadlines were tightening. Technical debt had started breeding in the dark. The assistant could draft tests, explain old code, suggest refactors and help junior engineers stop treating Stack Overflow like an underground pharmacy. And this was no longer fringe behavior. In 2025, Microsoft said that 15 million developers were already using GitHub Copilot, and the tool has spread further since then.

So yes, I approved it.

Then security nearly revolted.

That week taught me something I now say to clients more bluntly than I used to. AI coding tools do not just change software delivery. They change the terms of trust inside the company. They force you to answer ugly questions about control, proof, accountability and review discipline. Most public coverage still stares at productivity. The harder story sits elsewhere. Governance.

The part that looked sensible

The truth is, I didn’t approve the tool because I was dazzled. I approved it because I’ve spent years watching good people waste good hours on bad repetition.

You can only tell a team to “be strategic” so many times before they start laughing at you. Developers were buried under boilerplate, documentation drift, brittle legacy code and the kind of ticket churn that makes bright people look tired. A coding assistant looked like a relief. Not magic. Relief.

That distinction matters.

In advisory work, I’ve learned that many poor decisions do not begin as foolish decisions. They begin as reasonable decisions made inside an outdated control model. That’s what this was. The business case made sense. The mistake was assuming the old review system could keep up with the new speed.

That old assumption dies hard. Leaders often think software risk changes when the code changes. Often, it changes earlier, as production conditions change. If a machine now drafts what humans once wrote line by line, the issue is not only code quality. It is code volume, code origin and the shrinking time between suggestion and production.

That is a different risk shape.

Why security lost its patience

The security team was upset because they could see the math.

Code output was about to rise. Review time was not.

That gap is where trouble rents office space.

Many non-security leaders still imagine the concern is simple. “The AI might write bad code.” That’s the kindergarten version. The real concern is broader and nastier. Who reviewed the output? What hidden package did the model nudge into the build? What sensitive context got pasted into the prompt window? Which junior engineer trusted the suggestion because it sounded calm and looked polished? Which policy assumed human authorship when the draft came from somewhere else?

Those are not philosophical questions. They are operating questions.

Recent security work has made this much harder to dismiss. Snyk described a February 2026 case in which a vulnerability chain turned an AI coding tool’s issue triage bot into a supply chain attack path. That is the sort of sentence that makes security teams sit up straight and ask for names, logs and meeting invites.

And that is before you get to the quieter problem. AI-generated code can look tidy long before it is safe. Security people know that neat syntax can hide weak controls, lazy validation, poor handling of secrets and dependency choices nobody meant to own.

So when the team escalated, they weren’t staging a mutiny over a plugin. They were reacting to a change in production logic that nobody had yet governed.

What the fight was really about

Once the temperature dropped, the shape of the dispute became obvious to me. It was not engineering versus security. It was speed versus proof.

More precisely, it was four things:

  1. Velocity. The assistant increased output far faster than assurance could keep pace.
  2. Visibility. We did not have a clear sight of where the tool was used, what prompts were fed into it, what code it influenced or what external components it smuggled into the discussion.
  3. Validation. Existing checks were built for a world in which humans produced most of the first draft. That world is fading. When code generation speeds up, review cannot stay ceremonial.
  4. Governance. Nobody had written the rules that mattered most. Which use cases were fine? Which were off-limits? Who owned the risk of acceptance? What evidence would prove that the tool was used safely enough?

That last point gets too little airtime. Governance sounds dull until you don’t have it. Then it becomes the difference between controlled use and polite chaos.

NIST’s recent work on monitoring deployed AI systems makes the same point more broadly. Organizations need post-deployment measurement and monitoring because real-world behavior drifts, surprises occur and governance after launch remains immature. Different setting, same lesson. You cannot inspect your way out of weak operating design.

What we did next

We did not ban the tool. That would have been theatre dressed as courage.

We also did not waive it through and tell security to “partner more closely.” I’ve heard that sentence enough times to know it usually means, “Please absorb more risk with better manners.”

We did something less dramatic and more useful. We narrowed the rollout and rewrote the conditions of trust.

Low-risk use cases stayed in play. Drafting tests. Explaining old functions, helping with documentation and suggesting boilerplate. Those were manageable.

High-risk areas got tighter boundaries. Auth flows. Secrets handling. Encryption logic. Infrastructure-as-code for sensitive environments. Anything tied to regulated data or material security controls. Those needed a stricter review or stayed out of scope.

We also drew a hard line on prompt hygiene. No customer data. No credentials. No confidential architecture details were dropped into a chat window because someone wanted a faster answer on a Friday afternoon. You would think that goes without saying. It does not.

Then we raised the review standard. Human sign-off meant real sign-off, not a quick skim and a merge. Scanning had to cover dependencies and code changes with more discipline. Provenance mattered more. Logging mattered more. Exception paths had to be explicit, not social.

Most importantly, security moved from late-stage critic to co-designer. That changed the tone. The question stopped being, “Can we use this?” and became, “Under what conditions can we trust its use enough to defend it later?”

That small shift matters more than many policy documents.

What both sides got right — and wrong

Developers were right about the waste. They were right that these tools remove drudgery. They were right that refusing every new capability is not a strategy. A team that cannot experiment eventually decays into compliance theatre and backlog sorrow.

They were wrong to assume readable code is trustworthy code. They were wrong to treat assistance as neutral. Tools shape behavior. That is what tools do. Once suggestions arrive fast and fluently, people accept more than they admit.

Security was right about review debt. Right about supply chain exposure, right about data leakage risk. Right, governance should not arrive three incidents late, wearing a blazer and a lessons-learned slide.

They were wrong at first, as many security teams are when they feel cornered. They made the conversation sound like a moral referendum. That never helps. If security cannot offer a usable path, the business routes around it. Then you get the worst of both worlds: Secret adoption and public optimism.

I don’t say that with smugness. I say it because I’ve watched good teams damage each other by defending the right thing in the wrong way.

The bigger lesson for leaders

This is where the story stops being about one rollout and starts becoming board material.

If your developers can now produce more code with less effort, your governance burden rises even if your headcount does not. The old ratio between output and oversight has broken. Many firms have not adjusted.

That matters because software governance is no longer just about secure coding standards or release gates. It is about production conditions. Who can generate? Under what rules? With what evidence? Across which risk zones? With whose approval? And if something goes wrong, who owns the final act of acceptance?

Those questions sound administrative until the first incident report lands, and nobody can explain whether the flawed logic was written, suggested, copied, reviewed or merely assumed.

The market is moving quickly. Microsoft’s own recent security reporting says organizations adopting AI agents need observability, governance and security now, not later. Snyk is making a similar argument from the perspective of the software supply chain. Visibility first. Then prevention. Then governance that holds under pressure.

That is why I now advise something that used to sound severe and now sounds merely accurate. If you deploy AI coding tools without redesigning your control model, you are not buying productivity. You are buying ambiguity at machine speed.

What you should ask before you approve the next tool

You do not need a grand doctrine. You need a few hard questions asked before excitement turns into policy by accident.

Where can this tool be used, and where can’t it be used?

What data may enter it?

How will you know when the generated code reaches production?

What review standard applies when the first draft came from a machine?

Who can approve exceptions?

What logs, scans and decision records will let you defend the setup six months later, when memories blur and staff rotate?

That is not bureaucracy. That is self-respect.

I still believe these tools have value. I’d be foolish not to. But I trust them the way I trust a very fast junior colleague with a beautiful writing style and uneven judgment. Useful. Impressive. Worth keeping. Not someone you leave unsupervised near the crown jewels.

The near-mutiny turned out to be healthy. It forced the truth into the room before a failure did. Security was not blocking progress. They were objecting to unmanaged speed. Developers were not being reckless. They were asking for relief from the grind. Leadership’s job was not to pick a side. It was to write a better contract between them.

That is the part that too many firms still miss.

The argument was never only about a coding assistant. It was about whether we still knew how to govern work once the work started moving faster than our habits. That is a much bigger story. And if you listen carefully, you can hear it starting in many companies right now.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

  • ✇Security | CIO
  • Vibe coding goes enterprise: What you need to know about AI-driven legacy modernization
    Google’s CEO says vibe coding makes programming “enjoyable” and “exciting again.” Klarna’s CEO prototypes products in 20 minutes instead of waiting two weeks. Collins Dictionary named “vibe coding” its Word of the Year for 2025. The message seems clear: AI has democratized software development. Just describe what you want in plain English and let AI handle the code. For CIOs managing enterprise software estates, this narrative doesn’t fully capture the complexity of thei
     

Vibe coding goes enterprise: What you need to know about AI-driven legacy modernization

5 de Maio de 2026, 09:00

Google’s CEO says vibe coding makes programming “enjoyable” and “exciting again.” Klarna’s CEO prototypes products in 20 minutes instead of waiting two weeks. Collins Dictionary named “vibe coding” its Word of the Year for 2025. The message seems clear: AI has democratized software development. Just describe what you want in plain English and let AI handle the code.

For CIOs managing enterprise software estates, this narrative doesn’t fully capture the complexity of their reality.

I’ve watched clients become captivated by the vibe coding promise. They see demos where AI generates a working prototype in minutes. They imagine their legacy modernization problems solved. Then they try applying these tools to a 25-year-old mainframe application processing millions of transactions daily and discover why speed alone doesn’t solve enterprise problems.

The gap between prototyping a new app and modernizing critical infrastructure isn’t about coding velocity. It’s about preserving decades of undocumented business logic while simultaneously transforming the technical foundation beneath it. That requires a fundamentally different approach than telling AI to “build me a customer portal.”

Diagram: Two approaches to AI-assisted development

Dotun Opasina

What vibe coding solves (and what it doesn’t)

Vibe coding — using natural language to prompt AI into generating code — has legitimate enterprise applications. A product manager can validate an idea without engineering resources. A business analyst can prototype a workflow automation without waiting for sprint capacity. A marketing team can build internal tools without IT tickets.

These are real productivity gains. When Sundar Pichai says vibe coding has “made coding so much more enjoyable,” he’s describing how AI removes friction from exploration and experimentation. The barrier between “I wish we had this” and “here’s a working version” has essentially collapsed.

But enterprise modernization isn’t exploration. It’s surgery on mission-critical systems where the patient can’t be sedated.

Consider the typical enterprise modernization scenario I encounter: A leading health care organization needed to modernize 10,000+ COBOL  mainframe screens to improve claims processing and customer service. These systems were built before most current developers were born. The original architects retired years ago. Documentation is incomplete or contradictory. Business rules are embedded in code that nobody fully understands anymore.

Vibe coding tools can generate modern code quickly. What they can’t do is tell you whether that code implements the same business logic as the legacy system — logic that represents decades of regulatory compliance decisions, edge case handling and institutional knowledge that was never written down.

This is where the “vibe coding hangover” hits enterprise IT. Fast code generation creates new problems when applied to complex, tightly coupled systems.

The specification problem nobody talks about

Here’s the uncomfortable truth about AI-assisted development: AI generates perfect code for poorly defined problems.

I’ve seen this pattern repeatedly in client work. Teams use AI to accelerate development. Code gets written faster than ever. Then they discover the code solves the wrong problem because the requirements weren’t clear enough to begin with.

For greenfield projects building something new, you can iterate quickly. Wrong assumption? Rewrite it. Missed a requirement? Add it next sprint. The cost of mistakes is measured in developer time and missed deadlines.

For legacy modernization, mistakes compound differently. You’re not just building new functionality. You’re replacing systems that process payroll, manage inventory, handle financial transactions, route customer service calls — critical operations where “oops, we missed a business rule” isn’t acceptable.

Traditional modernization approaches tried to solve this through massive requirements-gathering efforts. Armies of business analysts documenting every screen, every workflow, every edge case. These projects took years and often failed because by the time you finished documenting, the business had evolved.

The enterprise-grade AI approach inserts a different layer: specification extraction.

Rather than jumping from legacy code to modern code, systems that work at enterprise scale first extract what the legacy system does — the business rules, the dependencies, the logic flow — into a clear specification. That specification becomes the source of truth for generating modern code. It’s verifiable. It’s traceable. It preserves institutional knowledge that exists nowhere else.

At Publicis Sapient, our proprietary AI platform Sapient Slingshot embodies this specification-first approach. When RWE needed to modernize a 24-year-old application with no source code or documentation, the platform analyzed the running system to extract business logic before generating replacement code. What would have taken two weeks of manual reverse-engineering happened in two days, with human oversight ensuring accuracy.

This isn’t about speed. It’s about preserving what works while transforming how it runs.

Diagram: Why the specification layer matters.

Dotun Opasina

Why enterprise context changes everything

The difference between prototyping and production isn’t just scale. It’s context.

Vibe coding tools work well for isolated problems. Build a dashboard. Generate a data transformation script. Create an internal tool. These tasks have clear boundaries and limited dependencies.

Enterprise systems don’t have clear boundaries. A seemingly simple change to how customer addresses are validated might cascade through order processing, shipping logistics, tax calculation, fraud detection and customer service routing. Understanding those dependencies requires context that exists across thousands of files, dozens of databases and years of incremental changes.

This is where general-purpose AI coding assistants hit their limits. They can read individual files. They can suggest code completions. They can even generate multi-file changes. What they can’t do is understand how your 15-year-old inventory management system integrates with your 10-year-old order fulfillment platform which talks to your 5-year-old customer service tool — and why changing one piece breaks another.

Enterprise-grade AI modernization requires building an Enterprise Context Graph — a living map of how code, architecture, data and business rules connect. This context allows AI to make informed decisions about modernization, not just fast guesses.

When a health care organization used this approach to modernize critical legacy systems, the platform identified hidden dependencies that would have caused production failures if missed. The AI didn’t just generate modern code faster. It generated modern code that worked in the complex environment where it needed to run.

Diagram: AI coding context requirements

Dotun Opasina

What this means for CIO technology strategy

The vibe coding phenomenon signals something important: AI is changing how software gets built. But for enterprise leaders, the strategic question isn’t “Can AI write code faster?” It’s “Can AI help us escape decades of technical debt while keeping critical systems running?”

The answer is yes — but only with the right approach.

  • Stop optimizing for coding speed. Your constraint isn’t how fast developers can write code. It’s how accurately you can understand and preserve business logic while modernizing the technical foundation. Tools that prioritize speed over comprehension will create more problems than they solve.
  • Start measuring specification accuracy. The new productivity metric isn’t lines of code generated. It’s code-to-spec accuracy — how reliably the generated code implements verified business requirements. Platforms achieving 99% code-to-spec accuracy enable modernization projects that were previously too risky to attempt.
  • Treat institutional knowledge as a strategic asset. Your legacy systems contain decades of business logic that represents real competitive advantage — edge cases handled, regulatory requirements met, customer workflows optimized. Modernization approaches that discard this knowledge to move faster are destroying value in the name of speed.
  • Invest in context preservation, not just code generation. The winners in enterprise AI adoption won’t be organizations that generate code fastest. They’ll be organizations that can systematically extract, verify and modernize business logic at scale.

The modernization opportunity hiding in plain sight

Here’s what makes March 2026 different from March 2024: We now have AI systems capable of reading legacy code, extracting business rules and generating verified modern replacements at enterprise scale. The technology matured.

According to the Stanford AI Index 2025, 78% of organizations used AI in 2024, up from 55% in 2023. But adoption and effectiveness are different metrics. Most organizations are still experimenting with AI tools for individual developer productivity.

The strategic opportunity isn’t faster coding. It’s systematic technical debt elimination.

Consider the typical enterprise IT budget: 60-80% goes to maintaining legacy systems. That maintenance cost compounds annually as skills become scarcer and systems become more brittle. Every dollar spent keeping COBOL running is a dollar not spent on innovation.

Vibe coding tools won’t solve this. They’re built for creation, not preservation. Enterprise modernization requires AI that understands what you have before transforming it into what you need.

Organizations applying this approach are seeing 75% faster delivery timelines, 40% higher productivity and up to 50% savings in modernization costs. More importantly, they’re tackling modernization projects that were previously shelved as too risky or expensive to attempt.

The specification-first future

The vibe coding phenomenon will continue to accelerate. More business users will build tools. More prototypes will become products. More organizations will democratize software creation beyond traditional engineering teams.

For CIOs, this creates both opportunity and risk.

The opportunity: Free your engineering teams from routine development by enabling business users to build their own solutions. The risk: Create a fragmented estate of AI-generated tools that nobody can maintain.

The solution requires treating AI-assisted development as a spectrum. Prototypes and internal tools can embrace the speed and accessibility of vibe coding. Mission-critical systems and legacy modernization need specification-first approaches that prioritize accuracy and traceability over velocity.

Your competitors are experimenting with AI coding tools. The question is whether they’re building sustainable transformation capabilities or accumulating a new generation of technical debt at AI speed.

The CIOs who understand this distinction will spend 2026 systematically eliminating legacy constraints, while others remain focused on incrementally improving existing systems. By 2027, that gap will be difficult to close. Vibe coding democratized software creation. Enterprise-grade AI makes transformation predictable. Choose your tools accordingly.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

  • ✇Security | CIO
  • The $570K canary: What AI coding agents reveal about enterprise AI’s real gaps
    Boris Cherny, creator of Anthropic’s Claude Code, says he hasn’t written a line of code by hand in months. He shipped 22 pull requests one day, 27 the next, all AI-generated. Company-wide, Anthropic reports that 70 to 90% of its code is now written by AI. CEO Dario Amodei has predicted that AI could handle “most, maybe all” of what software engineers do within months. And yet Anthropic typically has dozens of software engineering openings, one reportedly carrying $570K
     

The $570K canary: What AI coding agents reveal about enterprise AI’s real gaps

4 de Maio de 2026, 07:00

Boris Cherny, creator of Anthropic’s Claude Code, says he hasn’t written a line of code by hand in months. He shipped 22 pull requests one day, 27 the next, all AI-generated. Company-wide, Anthropic reports that 70 to 90% of its code is now written by AI. CEO Dario Amodei has predicted that AI could handle “most, maybe all” of what software engineers do within months.

And yet Anthropic typically has dozens of software engineering openings, one reportedly carrying $570K in total compensation. As one observer noted, the company is simultaneously predicting the end of the profession and paying top dollar to hire into it.

Meanwhile, during his GTC 2026 keynote, NVIDIA CEO Jensen Huang said that 100% of NVIDIA now uses AI coding tools, including Claude Code, Codex and Cursor, often all three. Then, in a conversation on the All-In Podcast during GTC week, Huang sharpened the point: A $500,000 engineer who doesn’t consume at least $250,000 in AI tokens annually is like “one of our chip designers who says, guess what, I’m just going to use paper and pencil.”

This isn’t cognitive dissonance. It’s a signal. And CIOs who look past the headlines will find a pattern that explains not just where AI coding is going, but where all of enterprise AI is headed.

Tellers, not toll booth workers

The instinct is to see this as an extinction event. AI writes all the code; engineers become toll booth workers, replaced entirely by automation with no complementary role left behind. But the data tells a different story, one I explored in a recent CIO.com article on AGI skepticism.

When ATMs rolled out, bank teller employment didn’t collapse. It doubled, from 268,000 in 1970 to 608,000 in 2006. The machines eliminated the routine transaction. But cheaper branch operations meant banks opened more locations, which created demand for tellers who could handle complex financial conversations. Economists call this Jevons Paradox: When technology makes something more efficient, demand expands rather than contracts.

Software engineers are bank tellers, not toll booth workers. AI agents are eliminating routine implementation: The boilerplate, the CRUD endpoints, the standard test scaffolding. But that efficiency is expanding the total surface area of what “engineering” means. Anthropic isn’t paying $570K for someone to type code. They’re paying for the judgment to orchestrate AI agents that type code: Deciding what to build, evaluating whether the output is correct, governing what gets deployed and maintaining systems that are increasingly written by machines.

Cherny confirmed this shift directly. His team now hires generalists over specialists, because traditional programming specialties are less relevant when AI handles implementation details. The skill premium has moved from writing code to supervising it, from production to orchestration.

The reason AI coding agents work

Here’s the question CIOs should be asking: Why are AI agents succeeding in software development faster than in any other enterprise function?

It’s not because coding models are better than models for customer service, legal review or financial analysis. The underlying LLMs are the same. The difference is that software development already had the infrastructure that every other enterprise function lacks.

Developers didn’t build this infrastructure for AI. They built it for themselves, over decades. But it maps almost perfectly to the six infrastructure gaps that are currently blocking AI agents from moving beyond employee-facing pilots into customer-facing production.

6 gaps the SDLC already solved

1. Governance: Right data, right users, right permissions

In software development, governance is built into the workflow. Branch protection, code review policies and role-based access controls create a clear chain of permission from draft to deploy, whether the author is human or agent.

Most enterprise functions have nothing equivalent. When an AI agent drafts a customer response, accesses a patient record or modifies a financial model, the governance layer (who approved this action, what data was it allowed to see, which policies constrain its output) is either ad hoc or absent. Microsoft’s 2026 Cyber Pulse survey found that while 80% of Fortune 500 companies have deployed AI agents, only 47% have agent-specific security policies in place.

2. Observability: Trace and audit the decision trail

Every line of AI-generated code has a paper trail. Git blame shows who (or what) wrote it. CI/CD pipelines log every build, test and deployment. When something breaks in production, engineers can trace the failure from alert to commit to the specific agent session that produced the change.

Outside of engineering, AI agent decisions are largely opaque. A customer-facing agent that denies a claim or escalates a complaint leaves no audit trail. Without observability, enterprises can’t debug bad outcomes, satisfy regulators or build the trust necessary to expand agent autonomy.

3. Evaluation: Measure correctness at scale

Unit tests, integration tests, type checking, linting and automated QA give software engineering something no other enterprise function has: Continuous, objective measurement of whether AI-generated output is correct. That provides a foundation for proving an agent gets it right.

This is the gap other enterprise functions feel most acutely. DigitalOcean’s 2026 survey of 1,100 technology leaders found that 41% cite reliability as their number one barrier to scaling AI agents. Reliability is an evaluation problem: Without automated, continuous measurement of agent output quality, organizations can’t trust agents enough to put them in front of customers.

4. Memory: Persistent context beyond the context window

Developers take persistent context for granted. Version control, documentation and architectural decision records provide context that survives across sessions, teams and years. An AI coding agent can read the commit history, understand why a design choice was made in 2019, and factor it into today’s implementation.

Most enterprise AI agents operate in a memoryless state. Each customer interaction starts from scratch. Each agent session has no awareness of prior decisions, escalations or context beyond what fits in the context window. This is why employee-facing agents (IT help desks, NOC ticketing) succeed where customer-facing agents stall: Internal users tolerate repeating context. Customers do not.

5. Cost controls: Manage LLM spend across providers

Jensen Huang’s $250K-per-engineer token budget isn’t an abstraction. It’s a real cost management challenge that engineering teams are already navigating. Smart teams route differently depending on the task: Use a lightweight model for boilerplate generation, a reasoning model for architectural decisions and a code-specific model for refactoring. They set token budgets per agent session. They measure cost-per-PR and cost-per-feature, not just cost-per-token.

Enterprises deploying AI agents in other functions rarely have this granularity. When Goldman Sachs stated AI near-zero GDP impact in 2025, the missing variable was cost discipline at the workflow level. Without the ability to route, throttle and measure LLM spend per agent task, scaling agents means scaling costs linearly, which eventually kills ROI.

6. Deployment flexibility: Any cloud, on-prem, no lock-in

In software development, the runtime has always been portable. Code that runs on AWS today can run on Azure tomorrow, or on bare metal in your own data center. Containerization, Kubernetes and infrastructure-as-code tools like Terraform mean that engineering teams can change their minds about where workloads run without rewriting the application. Software has had this mindset for decades.

We’re early enough in this agentic development game that it’s tempting to take short cuts. Organizations that build on a single hyperscaler’s agent framework find themselves locked into that provider’s model ecosystem, observability tooling and pricing structure. As agentic AI matures, deployment flexibility (the ability to run agents on any cloud, on-prem or across hybrid environments without vendor lock-in) will separate organizations that scale from those that stall.

Sometimes you’ll want agents to run close to your data. Other times, you’ll want agents close to the users. And you’ll want your developers to be able to move back and forth between different agent code bases without having to learn a different framework between them.

What CIOs should watch at Build and I/O

Google I/O and Microsoft Build will dominate May with dueling AI coding announcements. The temptation will be to compare model benchmarks. That’s the wrong lens. The models are converging. The real competition is one layer down, in the infrastructure that makes AI agents viable outside of software development.

CIOs watching these conferences should evaluate each announcement against the six gaps: Is Microsoft closing the governance gap with Azure AI Foundry? Is Google advancing observability through Vertex AI? Which platform is making it easier to evaluate agent output at scale, maintain persistent memory across sessions, control costs at the workflow level and deploy without lock-in?

The company that wins the AI coding war will be the one that builds the infrastructure layer that transfers to every other enterprise function. That’s the real stakes of May’s developer conferences, and it’s the real reason CIOs should be paying attention.

The canary’s message

Software engineers are the first knowledge workers to live inside a fully agentic workflow. They’re the canary in the coal mine for every other enterprise function. And right now, the canary is singing, not dying.

The lesson isn’t that AI coding agents have made engineers obsolete. It’s that AI coding agents work because engineers already built the infrastructure that makes agents trustworthy. Governance, observability, evaluation, memory, cost controls and deployment flexibility: These aren’t nice-to-haves. They’re the reason Anthropic can ship 27 AI-generated pull requests in a day and sleep at night.

Every other enterprise function will need to build its own version of that infrastructure before AI agents can move from employee-facing pilots to customer-facing production. The models aren’t the bottleneck. The scaffolding around them is.

Anthropic paying $570K for a software engineer whose job might not exist in a year isn’t a contradiction. It’s Jevons Paradox. And it’s the most expensive leading indicator in enterprise AI.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

  • ✇Security | CIO
  • 샤오미, MIT 라이선스 ‘미모 V2.5’ 공개···장시간 실행 AI 에이전트 시장 겨냥
    샤오미는 MIT 라이선스 기반으로 미모(MiMo)-V2.5와 미모-V2.5-Pro를 공개하고 오픈소스 형태로 27일 배포했다. 개발자는 해당 모델을 활용해 코딩과 업무 자동화 등 장시간 작업을 수행하는 AI 에이전트를 보다 낮은 비용으로 구축할 수 있을 것으로 보인다. 두 모델은 모두 100만 토큰 규모의 컨텍스트 윈도를 지원한다. 미모-V2.5-Pro는 복잡한 에이전트 및 코딩 작업에 최적화됐으며, 미모-V2.5는 텍스트, 이미지, 영상, 오디오를 모두 처리할 수 있는 네이티브 옴니모달 모델이다. 이번 공개는 에이전트형 AI 워크로드가 기업 AI 예산에 새로운 부담으로 작용하는 상황에서 이뤄졌다. 해당 시스템은 작업 계획 수립, 도구 호출, 코드 작성, 오류 복구 과정에서 대량의 토큰을 소모하기 때문에, 개발자들에게 비용 관리와 배포 통제의 중요성이 더욱 커지고 있다. 샤오미는 MIT 라이선스를 통해 추가 승인 없이 상용 배
     

샤오미, MIT 라이선스 ‘미모 V2.5’ 공개···장시간 실행 AI 에이전트 시장 겨냥

28 de Abril de 2026, 23:59

샤오미는 MIT 라이선스 기반으로 미모(MiMo)-V2.5미모-V2.5-Pro를 공개하고 오픈소스 형태로 27일 배포했다. 개발자는 해당 모델을 활용해 코딩과 업무 자동화 등 장시간 작업을 수행하는 AI 에이전트를 보다 낮은 비용으로 구축할 수 있을 것으로 보인다.

두 모델은 모두 100만 토큰 규모의 컨텍스트 윈도를 지원한다. 미모-V2.5-Pro는 복잡한 에이전트 및 코딩 작업에 최적화됐으며, 미모-V2.5는 텍스트, 이미지, 영상, 오디오를 모두 처리할 수 있는 네이티브 옴니모달 모델이다.

이번 공개는 에이전트형 AI 워크로드가 기업 AI 예산에 새로운 부담으로 작용하는 상황에서 이뤄졌다. 해당 시스템은 작업 계획 수립, 도구 호출, 코드 작성, 오류 복구 과정에서 대량의 토큰을 소모하기 때문에, 개발자들에게 비용 관리와 배포 통제의 중요성이 더욱 커지고 있다.

샤오미는 MIT 라이선스를 통해 추가 승인 없이 상용 배포, 지속적인 학습, 파인튜닝이 가능하도록 했다고 밝혔다. 글로벌 시장조사업체 카던스 인터내셔널(Kadence International)의 수석 부사장 툴리카 실은 “기업이 제한 없이 모델을 수정하고 배포하며 상용화할 수 있다는 점에서 현재 AI 시장에서는 보기 드문 구조”라고 설명했다.

성능 측면에서도 경쟁력을 강조했다. 샤오미는 블로그를 통해 “ClawEval 벤치마크에서 미모-V2.5-Pro는 약 7만 토큰만 사용해 64%의 통과율을 기록했다”며 “이는 유사한 성능 수준의 클로드 오퍼스 4.6, 제미나이 3.1 프로, GPT-5.4 대비 약 40~60% 적은 토큰 사용량”이라고 밝혔다.

두 모델은 MoE 구조를 적용해 연산 비용을 효율적으로 관리한다. 3,100억 파라미터 규모의 미모-V2.5는 요청당 150억 파라미터만 활성화하며, 1조 200억 파라미터 규모의 Pro 버전은 420억 파라미터만 사용한다. 또한 Pro 모델의 하이브리드 어텐션 설계는 장문 컨텍스트 작업 시 KV 캐시 저장량을 최대 7배 가까이 줄일 수 있다는 설명이다.

샤오미는 장기 작업 성능을 입증하기 위한 사례도 공개했다. 미모-V2.5-Pro는 러스트 기반 SysY 컴파일러를 4.3시간 동안 672회의 도구 호출을 통해 완성했으며, 233개의 숨겨진 테스트를 모두 통과했다. 또한 11.5시간 동안 1,868회의 도구 호출을 수행해 8,192줄 규모의 데스크톱 영상 편집기를 생성했다고 밝혔다.

미모, 기업 AI 선택지 될까

샤오미의 미모-V2.5 모델이 에이전트형 코딩 및 자동화 워크로드에서 폐쇄형 프론티어 모델을 넘어 기업 개발자들 사이에서 채택될 수 있을지는 성능, 비용, 리스크에 대한 평가 방식에 달려 있다.

시장조사업체 옴디아의 수석 애널리스트 리안 지에 수는 “샤오미의 미모-V2.5와 파생 모델을 평가할 때 기업 개발자들은 총소유비용(TCO)을 중심으로 봐야 한다”며 “TCO는 토큰 효율성, 작업 성공당 비용, 그리고 독점 모델에 수반되는 라이선스 비용이 없다는 점으로 구성된다”고 설명했다. 이어 “폐쇄형 프론티어 모델은 일반적인 작업이나 가장 까다로운 극단적 사례에서는 여전히 우위를 점할 수 있지만, 대량 처리 중심의 에이전트 작업에서는 오픈 가중치 모델이 더 뛰어난 성과를 보인다”고 덧붙였다.

컨설팅 기업 파리크 컨설팅(Pareekh Consulting)의 CEO 파리크 자인은 “기업은 미모-V2.5를 클로드나 GPT의 대체재로 보기보다, 고토큰 워크로드를 위한 비용 효율적인 에이전트 모델로 평가해야 한다”고 밝혔다. 그는 “핵심 벤치마크 지표는 단순한 정확도가 아니라 ‘작업 성공당 토큰 수’”라며 “프론티어 모델은 복잡한 코딩 벤치마크에서 높은 성공률을 보이지만, 그만큼 막대한 추론 비용이 수반된다”고 설명했다. 이어 “미모-V2.5는 토큰 효율성을 중심으로 설계돼 훨씬 적은 입력·출력 토큰으로 유사한 결과를 도출한다”고 강조했다.

자인은 이러한 특성이 반복적인 코딩, 품질검증(QA), 마이그레이션, 문서화, 테스트, 자동화 작업에서 미모와 같은 모델을 ‘경제적 핵심 엔진’으로 활용할 수 있게 한다고 분석했다. 다만 가장 난도가 높은 작업에서는 여전히 폐쇄형 프론티어 모델이 품질 기준의 상한선을 유지할 것이라고 덧붙였다.

가트너의 수석 책임 애널리스트 애시시 배너지는 “미모와 같은 모델은 장기 작업 에이전트 분야에서 기업 AI 경제 구조를 실질적으로 변화시킬 수 있다”고 평가했다. 그는 “작업 규모가 수백만 토큰으로 확대되면 종량제 기반의 독점 API는 편의성이 아니라 반복 비용 부담으로 작용하게 된다”며 “반면 미모는 MIT 라이선스, 오픈 가중치, 100만 토큰 컨텍스트, 비교적 낮은 가격을 기반으로 프라이빗 클라우드나 자체 구축 환경에서도 전략적으로 충분한 경쟁력을 갖는다”고 설명했다.

다만 이는 기업들이 독점 API를 완전히 포기한다는 의미는 아니다. 배너지는 “기업들은 높은 정확도가 요구되거나 운영 부담을 최소화해야 하는 경우에는 계속해서 독점 API를 활용할 것”이라며 “대규모로 반복 가능한 에이전트 워크플로는 비용 예측 가능성, 데이터 통제, 커스터마이징이 중요한 오픈 모델로 이동하게 될 것”이라고 내다봤다. 이어 “결국 장기·대규모 에이전트형 AI 시장은 하이브리드 구조로 발전하며, 미모와 같은 오픈 모델이 API 의존도를 낮추는 역할을 하게 될 것”이라고 덧붙였다.

한편 수 애널리스트는 “중국에서 개발된 모델이라는 점이 규제가 엄격한 서구권 기업에서는 우려 요소로 작용할 수 있어 도입 과정에서 장애물이 될 가능성도 있다”라고 지적했다.
dl-ciokorea@foundryco.com

  • ✇Security | CIO
  • Your AI coding agent isn’t a tool. It’s a junior developer. Treat it like one
    Yet that is precisely how most organizations are deploying AI coding agents today. The prevailing narrative around “AI-powered development” frames these systems as productivity tools. Vibe-coding and agentic coding are considered something closer to a faster autocomplete or a more sophisticated IDE plugin. Flip the switch, the story goes, and suddenly your engineering organization becomes dramatically more efficient. Everyone is “all in” on the first hand of cyber-Texas Hol
     

Your AI coding agent isn’t a tool. It’s a junior developer. Treat it like one

23 de Abril de 2026, 07:00

Yet that is precisely how most organizations are deploying AI coding agents today. The prevailing narrative around “AI-powered development” frames these systems as productivity tools. Vibe-coding and agentic coding are considered something closer to a faster autocomplete or a more sophisticated IDE plugin. Flip the switch, the story goes, and suddenly your engineering organization becomes dramatically more efficient. Everyone is “all in” on the first hand of cyber-Texas Hold ’Em. That mental model is wrong.

AI coding agents are not tools. They behave far more like junior developers: Capable, energetic, sometimes brilliant, but absolutely capable of causing catastrophic damage if given autonomy before they understand and respect the environment they’re operating in.

The organizations that treat AI coding agents like tools will create and accumulate technical debt at unprecedented speed. The organizations that treat them like junior engineers by onboarding them as talent, pairing with them and teaching them context will unlock the productivity gains everyone is chasing. The difference between those outcomes is not the technology. It is the management model.

The lesson every engineer learns early

Midway through the DevOps phase of my career, I worked at the CME Group, where the exchange operates one of the most critical financial infrastructures on the planet. The CME processes roughly a quadrillion dollars’ worth of contracts annually and, at the time, ran across five datacenters with more than 10,000 servers, including racks of Oracle Exadata systems costing hundreds of thousands of dollars each. The biggest SIFI of SIFIs.

You did not get root access to that environment on day one.

Instead, you were paired with a mentor. Your mentor was part of a buddy system for onboarding new hires and was effectively a docent for the infrastructure. My mentor was a deeply technical manager named Matt, one of the most capable engineers I have ever worked with. His job wasn’t simply to show me which commands to run or where to find documentation. His job was to teach me how to ask the platform, a system of systems, meaningful questions.

When you’re managing infrastructure at that scale, every question returns thousands of answers.

  • Are the matching engines pinned correctly to CPU cores?
  • Are cgroups configured properly for workload isolation?
  • Which RAID arrays are starting to show drive failures?
  • Are firmware and BIOS versions aligned across production and QA?

None of this can be learned through a quick tutorial or a training video. You learn by doing. You learn by working through the ticket queue, performing dry runs, preparing rollback plans and executing changes within narrow maintenance windows (a few minutes per week).

The lesson wasn’t simply technical. It was epistemological. Engineering expertise is not about knowing commands. It is about knowing which questions matter and how to understand the response. And that knowledge only develops through mentorship, iteration and experience.

Why the pair-programming model matters

The software industry already solved this problem decades ago through a practice called pair programming. In agile teams, a senior developer pairs with a junior one. They work together on the same problem in real time. The junior developer contributes energy and fresh thinking, while the senior developer contributes experience and judgment. The result is faster capability development without sacrificing quality. At first, it might seem an expensive allocation of resources, but when you think it through, it is really a strong knowledge management technique.

AI coding tools are like a super smart baby, a nascent intelligence that is as eager as any recent college graduate, but without much in the way of real-life experience solving real-world problems because it cannot rely on a body of lived experience and hard-won lessons in software development, release engineering and debugging. That description should sound familiar. It is essentially the profile of a junior developer.

The implication is obvious once you see it: the most effective deployment model for AI coding agents is the same pairing model that works for human developers. Human plus agent.

Not a human supervising an agent after the fact. Not just a human reviewing pull requests from an automated pipeline. But genuine co-development, with contextual education on why the vulnerability should not be introduced in the first place. When that pairing works, the productivity gains are real. When it doesn’t, you ship vulnerabilities faster than your security team can ever hope to triage them.

What the agent gets wrong first

The first time I worked alongside a coding model on a real security problem, the mistake it made was subtle but revealing. I was experimenting with ways to harden an API without introducing latency or complexity on the client side. The goal was to produce a transparent security uplift that improved the API’s defensive posture without forcing developers to substantially change how they interacted with the service.

The model generated plausible suggestions quickly. Too quickly. Some of the techniques it proposed were technically correct but operationally obsolete. Others referenced security mechanisms that had been deprecated. Still others ignored non-functional requirements around compliance or performance. In other words, the model surfaced relevant information but lacked the judgment to distinguish wheat from chaff. 

There is also a tendency to accept the legitimacy of the ask rather than questioning the assumptions and baseline parameters of the situation. The agent is not going to think outside the box (unless it is hallucinating a nonexistent function or package/library that solves the problem). It assumes that the question being asked for it to try to solve is a legitimate and valid question or problem to be solved.

Humans develop that discernment over time. It’s part of how we move from data to information to knowledge to wisdom. What information scientists have called the DIKW pyramid.

Models don’t struggle their way up that pyramid. They jump directly to conclusions. The struggle, however, is a messy process of trial, failure and iteration, but it is where human experience and knowledge form. That knowledge is then further refined and distilled into wisdom. When that process is skipped, real expertise never develops. This is why treating AI coding agents as tools is dangerous. Tools don’t need to exercise judgment. Junior developers do.

How trust actually develops

Think about the best junior engineer you ever worked with. How long did it take before you trusted them to work independently? Rarely less than months. Oftentimes a year or more.

Trust emerges gradually. It grows from observing how someone works through problems: how they document changes, how they write tests, how they think about rollback procedures and anticipating edge cases and race conditions. In my own teams, I’ve always preferred a management philosophy of 100% freedom and 100% responsibility (Netflix Manifesto circa 2001).

Engineers on my teams are expected to behave like owners of the company. They are indoctrinated to commit infrastructure changes as code. They document their reasoning. They attach testing artifacts to their pull requests. We track progress not just by time spent but by contributions: Commits, documentation, testing evidence and operational discipline.

That process shapes junior engineers into reliable junior engineers. The exact same logic applies to AI coding agents. Trust should expand progressively.

  • At first, the agent proposes little code snippets and stanzas.
  • Then it drafts functions and packages libraries.
  • Eventually, it might implement entire features, but only after proving it understands the environment and the risk appetite of the company.

Skip those steps, and you aren’t accelerating development. You’re accelerating chaos being driven by FOMO and FUD.

Learning from more than one chef

Over the course of my career, I’ve worked across a wide range of industries: dot-com era web development in San Francisco, trading infrastructure in European financial markets, cloud transformations for legacy enterprises and large-scale infrastructure engineering.

Each environment changed how I thought about software and security. The dot-com era taught speed and experimentation. European financial institutions taught rigorous project governance (PRINCE2 anyone?). Large-scale options and commodity exchanges taught what real operational resilience looks like.

Those experiences fundamentally reshaped how I approach engineering problems. AI agents will benefit from the same diversity. Pairing them with multiple engineers and rotating pairings over time will expose them to different coding styles, architectural philosophies and security techniques. Best practices, but not monolithic best practices aggregated and homogenized by token prediction algorithms trained on millions and billions of lines of code. Just as aspiring chefs learn from multiple masters, agents improve faster when exposed to varied expertise.

A warning for CISOs

Many security leaders today are under pressure to reduce developer headcount because executives believe AI can absorb the workload. This assumption misunderstands both security and AI. If an organization already has strong security discipline with well-documented architectures, clear coding standards and mature review processes then AI agents will amplify that core mindset and culture.

But if the organization has weak security habits, AI will amplify those weaknesses even faster. Human knowledge is like sunlight. Large language models are more like moonlight. A mere reflection of that knowledge. You cannot build a thriving ecosystem entirely under moonlight. Sooner or later, you need the sun, despite what the vampires and werewolves howling at the moon might lead you to believe.

The real promise of AI development

None of this is an argument against AI coding tools. Used properly, they are extraordinary collaborators. They can surface patterns across massive codebases, accelerate documentation and help engineers explore alternative designs more quickly than ever before.

But unlocking that potential requires the right mental model. Not as a tool, but as a junior developer. Onboard them. Pair with them. Teach them your systems, regale them with your stories of isolating a bug or race condition that took weeks to pinpoint. Rotate them across your teams. Expand their responsibilities gradually as trust develops.

That investment phase is what transforms AI from a novelty into a genuine multiplier. And like every good mentorship relationship in engineering, the payoff compounds over time. Treat your AI coding agent like a disposable tool and you’ll get disposable code (aka slop).

Treat it like a junior developer and you might just raise up the best engineering partner you’ve ever had.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

  • ✇Security | CIO
  • Salesforce launches Headless 360 to support agent-first enterprise workflows
    Salesforce is packaging its developer and AI tooling, including its vibe coding environment Agentforce Vibes, into a new platform named Headless 360, designed to help enterprise teams build agent-first workflows. The CRM software provider defines agent-first workflows as enterprise processes in which software agents, rather than human users, carry out tasks by directly invoking APIs, tools, and predefined business logic. To support this approach, Headless 360 exposes
     

Salesforce launches Headless 360 to support agent-first enterprise workflows

16 de Abril de 2026, 05:56

Salesforce is packaging its developer and AI tooling, including its vibe coding environment Agentforce Vibes, into a new platform named Headless 360, designed to help enterprise teams build agent-first workflows.

The CRM software provider defines agent-first workflows as enterprise processes in which software agents, rather than human users, carry out tasks by directly invoking APIs, tools, and predefined business logic.

To support this approach, Headless 360 exposes Salesforce’s underlying data, workflows, and governance controls as APIs, MCP tools, and CLI commands, via its existing offerings, such as Data 360, Customer 360, and Agentforce, Joe Inzerillo, president of AI technology at Salesforce, said during a press briefing.

This allows agents to operate directly on the platform’s existing business logic and datasets, rather than relying on separate integrations or user interfaces, Inzerillo added.

Push to become a control layer for enterprise AI agents

Analysts, however, see Headless 360 as an effort by Salesforce to position itself as a central layer for managing agent-driven operations across different business functions in enterprises, moving from a system of record to being the system of execution.

“Salesforce knows the center of gravity is moving toward coding agents, conversational interfaces, agent harnesses, and external runtimes, so it is trying to keep Salesforce relevant as the system underneath,” said Dion Hinchcliffe, VP of the CIO practice at The Futurum Group.

With Headless 360, Hinchcliffe added, Salesforce is trying to move its positioning beyond “AI agents inside Salesforce” to framing “Salesforce as a programmable platform for agents operating across external tools, interfaces, and environments.”

Analysts warn that CIOs need caution before adopting Headless 360.

Scott Bickley, advisory fellow at Info-Tech Research Group, said modern data stacks can replicate much of Headless 360’s functionality with more flexibility and less vendor concentration.

There are other issues that Bickley thinks should worry CIOs: “There is no mention of cost or the underlying licensing model for this ‘headless’ experience.  Are all tools included at no cost?” 

“Salesforce’s MO seems to be to announce new capabilities that require SKUs. CIOs should be asking about pricing now, before building in architectural dependencies on features that might land in a premium cost tier,” Bickley cautioned. 

Also, the analyst pointed out that Salesforce’s announcement is silent on SLAs for operations such as MCP tool calls, which matter materially for real-time agent workflows.

Incremental gains for developers despite broader concerns

Despite these concerns, Bickley sees some of the new Headless 360 features, although undifferentiated from the competition, as offering practical benefits for developers in their daily tasks.

The analyst was referring to newer updates, such as new MCP tools that give external coding agents full access to Salesforce’s platform, the DevOps Center MCP, the Agentforce Experience Layer, and newer governance features.

Enabling full access to external coding agents, such as Claude Code and Codex, in particular, Bickley said, helps Salesforce to meet the developer where they are or let them continue using the tool of their choice.

“Historically, developers were forced into Salesforce’s proprietary toolchain that included clunky VS Code extensions, painful metadata APIs, and quirky development pipelines that required Salesforce-specific expertise. Expanding the dev environment helps alleviate this pain,” Bickley pointed out.

The other updates, according to Hinchcliffe, should help curtail developer friction by helping avoid frequent switching between development tools, expanding real-time awareness of organization data, reducing the need for custom plumbing to expose business logic, and decreasing the effort needed to move from prototype to deployment.

Focusing specifically on the new DevOps Center MCP, which is a set of AI-powered tools that enable the use of natural language across the entire DevOps lifecycle, Bickley said that it will help developers alleviate pains around CI/CD processes.

“Salesforce development pipelines are notoriously fragile with metadata dependencies, org-specific configurations, artificial limits on work items, and UI response issues, among others,” Bickley added.

Concerns around the maturity of governance capabilities

The governance tools, specifically the updates to the Testing Center, Custom Scoring Evals, Session Tracing, and A/B Testing API, according to Hinchcliffe, too, address real gaps that enterprise development teams face, especially moving agentic workflows or applications into production.

“Salesforce is correctly identifying that enterprise agent adoption will stall unless buyers can properly measure, govern, debug, and tune agent behavior over time,” the analyst said.

However, Bickley cautioned about the efficacy of these tools, as most of these tools are in the very early stages of their release. In fact, the analyst suggested that enterprises should expect to supplement these tools with their own evaluation frameworks for the next 12-18 months.

The analyst also flagged additional concerns around newer components such as the Agentforce Experience Layer, which is a new UI service that allows developers to decouple what an agent does from how it surfaces across various services and applications.

“Ironically, this adds yet another layer to contend with in the development process for what is already considered a painful development experience. Salesforce has a pattern of shipping v1 tools that work great in demos but fall in real-world scenarios,” Bickley said. 

“Development teams intending to avail themselves of these new feature sets should insist that Salesforce provide them an extended pilot and sandbox free of charge to validate the maturity level and ease of use of these new features,” Bickley added.

All the updates to Headless 360, Salesforce said, are expected to be released in phases. Generally available features include Agentforce Vibes 2.0, the DevOps Center MCP, Session Tracing, and the Agentforce Experience Layer. Features that are in early access include Custom Scoring Evals. Other features, such as the Testing Center and the Salesforce Catalog, are scheduled for rollout in May and June, respectively.

This story first appeared on InfoWorld.

❌
❌