Do artificial intelligence systems really have their own secret language?

U.S. researchers have raised the intriguing argument that the DALL-E 2 model may have invented its own secret language to understand object descriptions.

By: Cabinet J. SnowwellPostdoctoral Research Fellow, Computational Law and AI Responsibility, Queensland University of Technology

A new generation of artificial intelligence (AI) models can produce on-demand “creative” images based on text guidance. Software like Imagen, MidJourney and DALL-E 2 are starting to change the way creative content is created with implications for copyright and intellectual property issues.

While the productivity of these models is often impressive, it is difficult to know exactly how they produce their results (a problem that accompanies the entire deep learning world and has no answer yet, although several companies including IBM claim to develop systems that will provide explanations but still no such system Went to market). Last week, U.S. researchers raised the intriguing argument that the DALL-E 2 model may have invented its own secret language to understand object descriptions.

Researchers entered DALL-E 2 to create images containing text captions, and then entered the captions (gibberish) back into the system, the researchers concluded that DALL-E 2 thinks Vic- Vicootes Means “vegetables“, While ש-Wa ch zod rea refers to”Whale sea creatures may eat“. These claims are fascinating, and if they are true, they can have important implications for security and interpretability for this type of large AI model. So what exactly is going on?

Does DALL-E 2 have a secret language?

DALL-E 2 probably does not have a “secret language”. Perhaps it is more accurate to say that he has Vocabulary His own – but even that we can not know for sure.

First of all, at this stage it is very difficult to verify any claims about DALL-E 2 and other large models of artificial intelligence, since only a handful of researchers and creative professionals have access to them. Any image shared publicly (on Twitter for example) should be received with a fairly large grain of salt, as they were “picked” by a person from among many output images created by artificial intelligence.


Even those with access can use these models only in limited ways. For example, DALL-E 2 users can create or modify images, but can not (yet) interact more deeply with the artificial intelligence system, for example by changing the behind-the-scenes code. This means that “explanatory artificial intelligence” methods for understanding how these systems work are not applicable, and a systematic investigation of their behavior is challenging.

So, what’s up?

One possibility is that “gibberish” expressions are related to words from languages ​​other than English. for example, ApoploeWhich appears to create images of birds, similar to Latin Apodidae, Which is the binomial name of a family of bird species. This seems like a plausible explanation. For example, DALL-E 2 was trained on a very wide range of data taken from the Internet, which included many non-English words.

Similar things have happened in the past: Large models of artificial intelligence in natural language have accidentally learned to write computer code without intentional training.

Is it all about tokens?

One point that supports this theory is the fact that models of artificial intelligence language do not read text as humans do. Instead, they break down input text into “TOKENS” before processing it.

Different token approaches have different results. Treating each word as a token seems like an intuitive approach, but causes trouble when identical tokens have different meanings (like the way “match” means a different thing in a game of tennis or in a fire.).

On the other hand, referring each character as a token produces a smaller number of possible tokens, but each of them conveys much less significant information.

DALL-E 2 (and other models) use an intermediate approach called pair house encoding (BPE). Examining the BPE representations for some gibberish words suggests that this approach may be an important factor in understanding “secret language.”

Not the whole picture

The “secret language” can also be just an example of the principle “garbage goes in and garbage goes out”. DALL-E 2 can not say “I do not know what you are talking about”, so it will always create some image from the given input text.

Either way, none of these options provide full explanations for what is happening. For example, you can remove individual characters from gibberish words Corrupts the images created in very specific ways. And it seems that individual gibberish words do not necessarily fit together to produce Coherent complex images (As they would if there really was a secret “language” under the hood).

why is it important

Beyond the intellectual curiosity, you may be wondering if all this really matters.

The answer is yes. DALL-E’s ‘secret language’ is an example of a ‘rival attack’ against a machine learning system: a way to break the system’s intended behavior by deliberately selecting inputs that artificial intelligence does not handle well.

One of the reasons for carrying out such attacks is that they challenge our trust in the model. If artificial intelligence interprets gibberish words in unintentional ways, it may also interpret meaningful words in unintentional ways. This also raises security concerns. DALL-E 2 filters input text to prevent users from producing harmful or offensive content, but the “secret language” of gibberish may allow users to bypass these filters.

A recent study found “switch phrases for certain artificial intelligence models – short nonsense phrases like” zoning tap fiennes “that can reliably cause models to write racist, harmful or biased content. This study is part of an ongoing effort to understand and control how deep learning systems are complex Learning from data.

Finally, phenomena such as the “secret language” of DALL-E 2 raise concerns about interpretation. We want these models to behave as one would expect, but seeing built-in output in response to gibberish confuses our expectations.

Sheds light on existing concerns

You probably remember the uproar that erupted in 2017 around some Facebook chatbots that “invented their language.” The current situation is similar in that the results are worrying – but not in the sense of “Skynt has come to take over the world”.

Instead, the “secret language” of DALL-E 2 underscores existing concerns about the resilience, security, and interpretive capacity of deep learning systems.

For an article in The Conversation

More on the subject on the Knowledge website:

Facebook
Twitter
LinkedIn
Pinterest
Pocket
WhatsApp

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Recent News

Editor's Pick