The rapid deployment of AI agents in the enterprise has created a dangerous architectural blind spot: for most companies, AI agent credentials live in the same box as untrusted code. In this “monolithic” pattern, the AI model’s reasoning, the tools it calls, and the sensitive API keys it uses to access corporate data all coexist within a single process. If an attacker triggers a prompt injection, they don’t just compromise the agent—they inherit the keys to the kingdom.
The scale of this vulnerability is becoming clear. While PwC’s 2025 AI Agent Survey indicates that 79% of organizations are already utilizing AI agents, security approvals have not kept pace. According to the Gravitee State of AI Agent Security 2026 report, only 14.4% of 919 surveyed organizations reported full security approval for their agent fleets. This gap has led the Cloud Security Alliance (CSA) to describe the current state of agentic trust and governance as a “governance emergency.”
The industry’s urgency was on full display at RSAC 2026, where keynotes from Microsoft, Cisco, CrowdStrike, and Splunk converged on the same conclusion: zero trust must be extended to AI agents. Cisco’s Jeetu Patel described agents as behaving “more like teenagers, supremely intelligent, but with no fear of consequence,” while Matt Caulfield, Cisco’s VP of Product for Identity and Duo, argued that security must move beyond one-time authentication to a model of continuous verification for every single action an agent attempts.
The Monolithic Liability and the ‘ClawHavoc’ Warning
For most developers, the default way to build an agent is to wrap everything in a single container. This creates a massive blast radius. As OAuth tokens and git credentials sit in the same environment where the agent executes generated code, a single successful injection can allow an attacker to exfiltrate tokens or spawn unauthorized sessions. The risk is compounded by a lack of ownership; a CSA and Aembit survey found that 68% of IT professionals cannot distinguish agent activity from human activity in their logs.
The real-world consequences of this fragility were highlighted by CrowdStrike CEO George Kurtz, who pointed to the “ClawHavoc” campaign. This supply chain attack targeted the OpenClaw agentic framework, with Antiy CERT confirming 1,184 malicious skills tied to 12 publisher accounts. Research from Snyk found that 13.4% of scanned ClawHub skills contained critical security flaws. Most alarming is the speed of exploitation: the CrowdStrike 2026 Global Threat Report notes that the fastest observed breakout time has dropped to just 27 seconds.
Two Paths to Zero Trust: Anthropic vs. Nvidia
As the monolithic pattern proves untenable, two distinct architectures have emerged to define where the blast radius actually stops. Anthropic and Nvidia have both shipped public zero-trust frameworks, but they approach the problem from opposite directions: one focuses on structural separation, the other on rigorous containment.
Anthropic: Decoupling Brain from Hands
Launched in public beta on April 8, Anthropic’s Managed Agents split the agent into three mutually distrustful components: the “brain” (Claude and the routing harness), the “hands” (disposable Linux containers for code execution), and a “session” (an external, append-only event log).
In this model, credentials never enter the execution sandbox. OAuth tokens are stored in an external vault; when an agent needs to employ a tool, it sends a session-bound token to a proxy, which fetches the real credential and executes the call. The agent never sees the actual token. This structural removal of credentials means a compromised sandbox yields nothing for an attacker to reuse. This design also improved performance, dropping the median time to first token by roughly 60%.
Nvidia: The Layered Fortress
Nvidia’s NemoClaw, released in early preview on March 16, keeps the agent and execution environment together but wraps them in five layers of enforcement. This includes kernel-level isolation via Landlock and seccomp, and a “default-deny” outbound network policy that requires manual operator approval via YAML.
NemoClaw’s strength lies in observability. A real-time Terminal User Interface (TUI) logs every action and blocked connection. Still, this high visibility comes at a cost: operator load scales linearly with agent activity. Unlike Anthropic, NemoClaw lacks an external session recovery mechanism; if the sandbox fails, the agent’s state is lost.
| Feature | Anthropic Managed Agents | Nvidia NemoClaw |
|---|---|---|
| Credential Location | External Vault (Structural Isolation) | In-Sandbox / Policy-Gated |
| Execution Model | Disposable Containers (Hands) | Layered Sandbox (Fortress) |
| State Durability | External Session Log | Internal Sandbox Files |
| Operator Burden | Low (Console Tracing) | High (Manual Approval/TUI) |
The Credential Proximity Gap
The critical divergence between these two systems is “credential proximity.” In Anthropic’s architecture, an attacker must perform a “two-hop” attack—influencing the brain’s reasoning and then convincing it to act through a token-less container. Single-hop exfiltration is structurally impossible.
In NemoClaw, while the privacy router keeps inference keys on the host, integration tokens (such as those for Slack or Discord) are injected into the sandbox as environment variables. While these are policy-gated, they remain physically proximate to the execution environment. This creates a vulnerability to “indirect prompt injection,” where an agent processes a poisoned web page or API response. Because the injected instructions enter the reasoning chain as trusted context, they sit immediately next to the credentials they seek to steal.
David Brauchler of the NCC Group argues for “trust segmentation,” where AI systems inherit the trust level of the data they process. While both vendors are moving toward this goal, neither has fully eliminated the risk of untrusted input influencing privileged actions.
For security directors, the path forward involves five immediate priorities: auditing all deployed agents for the monolithic pattern, requiring structural credential isolation in RFPs, testing session recovery to prevent data loss during sandbox crashes, staffing for the specific observability model of the chosen vendor, and demanding a clear roadmap for mitigating indirect prompt injections.
The shift toward agentic zero trust has moved from theoretical research to active implementation. As enterprises move from beta tests to production fleets, the gap between deployment speed and security approval remains the primary vector for the next generation of corporate breaches. The next critical checkpoint will be the release of finalized governance frameworks from the CSA and further public beta updates from Anthropic and Nvidia regarding indirect injection mitigations.
Do you believe structural isolation or layered monitoring is the more viable path for enterprise AI? Share your thoughts in the comments.
