TrustFall" Attack: How Hackers Exploit AI Coding Agents to Hijack Supply Chains

For most software engineers, the “Trust” button is a reflex. We click through security prompts and accept repository permissions with a speed that would make a security auditor shudder, all in the name of velocity. It is a habit born of necessity in a fast-paced development cycle, but according to new research, that habit has become a wide-open door for threat actors.

Security researchers at Adversa.AI have uncovered a critical vulnerability they’ve dubbed “TrustFall,” a flaw that allows attackers to seize full control of a developer’s system through AI-powered coding assistants. The attack doesn’t require a complex exploit or a zero-day vulnerability in the traditional sense. instead, it weaponizes the very autonomy that makes these new AI agents so appealing.

As a former software engineer, I’ve seen how the industry prioritizes “frictionless” workflows. But TrustFall proves that when you remove friction from security, you often remove the security itself. By manipulating the way AI agents interact with external code, attackers can execute malicious scripts with full system privileges, potentially turning a single developer’s workstation into a launchpad for a global software supply chain attack.

The Anatomy of a One-Click Compromise

The vulnerability centers on the rise of “agentic” AI—tools like Claude Code that don’t just suggest snippets of code but can autonomously scan repositories, run commands, and manage files. To function, these tools rely on the Model Context Protocol (MCP), a standard designed to let AI models interact with external data and tools.

View this post on Instagram about Claude Code, Model Context Protocol

From Instagram — related to Claude Code, Model Context Protocol

The TrustFall attack begins with a “honey pot” repository on GitHub. An attacker places a seemingly useful project online, waiting for a developer to use an AI agent to analyze it. When the agent clones the repository, it triggers a security dialog. In the case of Claude Code, the prompt asks: “Quick safety check: Is this a project you created or one you trust?”

The danger lies in the default setting: the answer is pre-set to “Trust.” A single press of the Enter key—a movement many developers perform without looking—is all it takes. Once trusted, the AI agent reads configuration files within the repository, such as .claude/settings.json or .mcp.json. If an attacker has inserted a parameter like enableAllProjectMcpServers, the tool automatically approves and launches any MCP server defined in that project.

Because these servers run as non-sandboxed operating system processes, they inherit the full privileges of the developer. This allows the attacker to establish a permanent connection to a Command-and-Control (C2) server, granting them remote access to the machine. Most concerningly, the malicious payload can be embedded directly within the JSON configuration, leaving no separate script file for static security scanners to detect.

From Local Machine to Global Supply Chain

While a compromised laptop is a serious issue, the systemic risk is far greater. Modern software is built through Continuous Integration and Continuous Deployment (CI/CD) pipelines. These pipelines are the arteries of the tech world; if they are poisoned, every user of the resulting software is at risk.

Alex Polyakov, co-founder and CTO of Adversa.AI, warns that developers of widely used open-source tools are the primary targets. If an attacker compromises the machine of a maintainer for a popular library, they can steal environment variables, deploy keys, and signature certificates. These credentials can then be used to inject malicious code directly into a production build, bypassing traditional code reviews.

This is the nightmare scenario of supply chain security: a “silent” infection where the developer believes they are using an AI assistant to increase productivity, while the assistant is actually acting as a Trojan horse for a sophisticated state-sponsored actor or cybercriminal group.

A Systemic Blind Spot Across AI Giants

The TrustFall research reveals that this is not a bug unique to one company, but rather a shared industry convention. Adversa.AI tested the same attack chain against several of the most prominent AI CLI tools, finding an identical pattern of vulnerability.

AI Coding Agents Breached – Attackers Took the Keys

AI Tool	Vulnerable to TrustFall?	Default Trust Setting	Execution Level
Claude Code	Yes	Enabled (Trust)	Full OS Privileges
Gemini CLI	Yes	Enabled (Trust)	Full OS Privileges
Copilot CLI	Yes	Enabled (Trust)	Full OS Privileges
Cursor CLI	Yes	Enabled (Trust)	Full OS Privileges

Serge Malenkovich, a communications advisor at Adversa, noted that the vulnerability is a “convention” shared across agentic coding CLIs. By prioritizing the user experience and reducing the number of prompts, these tools have created a standardized vulnerability that attackers can exploit across different platforms.

The Debate Over “Informed Consent”

The response from the AI labs has been polarizing. When Adversa.AI reported the findings to Anthropic, the company reportedly declined to implement immediate changes. Anthropic’s position is rooted in the concept of user agency: if a user explicitly clicks “Yes, I trust this folder,” they have accepted the risks associated with the content of that folder.

Researchers argue that this is a fundamental misunderstanding of “informed consent.” In a complex repository with hundreds of nested folders and hidden JSON files, a user cannot possibly know what they are trusting at the moment they click the button. They are trusting the project, not necessarily a hidden configuration file that triggers a remote shell.

Adversa suggests a simple architectural fix: blocking critical configuration keys like enableAllProjectMcpServers when they appear inside a cloned repository, requiring these settings to be defined in a global, user-controlled configuration file outside the project’s directory.

How to Protect Your Workflow

Until AI providers move away from “reflexive trust” models, developers must adopt a more skeptical approach to agentic tools. To mitigate the risk of a TrustFall attack, consider the following practices:

Isolate AI Agents: Run AI coding assistants within a containerized environment (like Docker) or a dedicated virtual machine to prevent them from accessing your host OS privileges.
Manual Audit First: Never run an AI agent on a cloned repository until you have manually inspected the .json configuration files and hidden directories.
Pipeline Hygiene: Use AI agents only on feature branches that have undergone manual peer review before being merged into the main CI/CD pipeline.
Restrict Permissions: Avoid running CLI tools with administrative or root privileges.

The industry is currently at a crossroads. We are moving toward a future where AI agents do more of our heavy lifting, but that autonomy requires a new security paradigm. The “Trust” button is no longer a convenience; it is a liability.

The next critical checkpoint will be the industry’s response to the Model Context Protocol’s evolution. As MCP becomes more widely adopted, the community will need to decide whether security defaults should favor the developer’s speed or the system’s integrity. We expect further updates from the security community as more researchers attempt to replicate and expand upon the TrustFall findings.

Do you trust your AI assistants with full system access? Let us know your thoughts in the comments or share this story with your engineering team.

TrustFall” Attack: How Hackers Exploit AI Coding Agents to Hijack Supply Chains

The Anatomy of a One-Click Compromise

From Local Machine to Global Supply Chain

A Systemic Blind Spot Across AI Giants

The Debate Over “Informed Consent”

How to Protect Your Workflow

Related

Bastl Kalimba: A Unique Hybrid Synthesizer

How hotels are stopping the ‘dawn dash’ for sunbeds after man wins payout – BBC

You may also like

Leave a Comment Cancel Reply