LLMs Can Hide Secret Messages in Plain Text

by priyanka.patel tech editor

The art of the secret message is as old as writing itself, from Caesar’s shifted ciphers to invisible ink in the margins of wartime letters. But for those of us who spent years in software engineering before moving into reporting, the evolution of “hiding things in plain sight” has always been a digital game—usually involving the manipulation of least-significant bits in a JPEG or hiding data in the noise of an audio file.

Now, that game has moved into the realm of Large Language Models (LLMs). Recent research into text-in-text steganography reveals that LLMs are not just capable of generating human-like prose; they are exceptionally skilled at using that prose as a carrier for hidden communication. Unlike traditional encryption, which transforms a message into an unrecognizable string of gibberish, LLM-based steganography produces a “cover text” that looks entirely mundane to a human reader—and even to most AI detectors—while smuggling a secret payload beneath the surface.

This isn’t merely a technical curiosity. The ability to embed covert channels within AI-generated text creates a significant blind spot for cybersecurity frameworks. When a model can hide a password, a command, or a piece of sensitive data inside a seemingly innocent email about a quarterly budget meeting, the traditional tools we use to monitor data exfiltration become largely obsolete.

The Mechanics of Linguistic Cloaking

To understand how an LLM hides a message, it helps to think about how these models actually “write.” An LLM doesn’t choose a word because it understands a concept; it predicts the next token based on a probability distribution. For any given sentence, there are often multiple words that are statistically plausible. For example, if a model is completing the phrase “The weather today is…”, the tokens “sunny,” “pleasant,” and “beautiful” might all have high probability scores.

The Mechanics of Linguistic Cloaking
The Mechanics of Linguistic Cloaking
The Mechanics of Linguistic Cloaking
Can Hide Secret Messages Encryption

In text-in-text steganography, the model uses this flexibility as a coding system. By subtly shifting the choice of a token based on the binary sequence of a secret message, the LLM can encode information without altering the semantic meaning or the grammatical correctness of the sentence. If the secret bit is a ‘0’, the model might pick “sunny”; if it’s a ‘1’, it picks “pleasant.”

Because the resulting text remains within the bounds of natural language, the “stego-text” passes the eye test. To a human, it’s just a weather report. To a recipient with the correct decoding key—essentially a mirror of the model’s probability distribution—the secret message is easily extracted by analyzing which tokens were selected over their alternatives.

Encryption vs. Steganography: A Critical Distinction

There is a common misconception that steganography is simply another form of encryption. In reality, they solve two different problems. Encryption focuses on confidentiality—making the content unreadable. Steganography focuses on imperceptibility—making the existence of the communication unknown.

From Instagram — related to Critical Distinction There, Natural Language

When a security system sees an encrypted packet, it knows something is being hidden, even if it can’t read it. This often triggers a red flag, especially in highly regulated corporate or government networks. Steganography bypasses this “detection of existence.” By masking the secret data as a standard LLM output, the communication blends into the background noise of the modern digital workplace, where AI-generated summaries and emails are becoming the norm.

Comparison of Data Concealment Methods
Feature Traditional Encryption Digital Steganography (Images) LLM Text Steganography
Visibility Obvious (Ciphertext) Hidden (Noise) Hidden (Natural Language)
Primary Goal Unreadability Undetectability Undetectability
Detection Method Entropy Analysis Pixel Variance Statistical Distribution
Carrier None (The data is the message) Media files (.jpg, .wav) Syntactic structures

The Security Gap and the ‘Cat-and-Mouse’ Game

The primary stakeholders currently grappling with this technology are cybersecurity firms and Data Loss Prevention (DLP) teams. Most current DLP tools look for patterns—social security numbers, credit card formats, or known “forbidden” keywords. They are not designed to analyze the probabilistic likelihood of a word choice.

AIs Can Hide Secret Messages in Plain Sight

This creates several high-risk scenarios:

  • Covert Exfiltration: An insider could use an LLM to leak proprietary source code by embedding it into a series of innocuous-looking project updates.
  • C&C Communication: Malware could receive instructions from a Command-and-Control (C&C) server via a public-facing AI chatbot, where the instructions are hidden in the bot’s responses.
  • Bypassing AI Guardrails: Users might find ways to communicate “forbidden” prompts or responses to one another by encoding them in a way that bypasses the safety filters of the LLM provider.

The countermeasure to this is known as stegananalysis. Researchers are currently developing tools that can detect “statistical anomalies” in text. If a piece of writing consistently chooses the second-most-likely token over the first-most-likely token, it suggests a non-random pattern that could indicate a hidden message. However, as LLMs become more sophisticated, the “gap” between the most likely and second-most likely tokens narrows, making the hidden messages even harder to spot.

What Remains Unknown

While the proof-of-concept for LLM steganography is robust, we are still in the early stages of understanding its scalability. It is currently unclear how much “payload” a standard paragraph can carry before the text starts to sound unnatural or “uncanny.” There is also a significant question regarding model drift: if the sender uses GPT-4 and the receiver uses a slightly different version or a different model entirely, the probability distributions may not align, leading to corrupted secret messages.

the industry has yet to establish a standardized “watermarking” system that can definitively distinguish between a naturally generated AI response and one that has been manipulated for steganography.

The next major milestone in this field will be the release of updated benchmarks for AI-driven stegananalysis, expected to be discussed in upcoming cybersecurity forums and academic peer reviews throughout the next year. As these detection tools evolve, the battle for the “hidden layer” of the internet will only intensify.

Do you think AI-generated text should be mandated to carry a digital watermark to prevent this kind of covert communication? Let us know in the comments or share this story with your network.

You may also like

Leave a Comment