Hidden Google App Menu Reveals New AI Models for Gemini Live

This proves a familiar ritual for those of us who spent years in software engineering: digging through app binaries and server-side flags to find the features Google has built but isn’t yet ready to show the world. This time, the discovery is more than just a UI tweak. A hidden menu discovered in Google App v17.18.22 has pulled back the curtain on seven distinct AI models currently being tested for Gemini Live, suggesting a massive expansion of how Google’s voice AI thinks, listens, and adapts.

The discovery came not from an official leak, but from a persistent investigation into the Google app’s code. By enabling a previously unseen settings cog button, researchers uncovered a model selector menu—a tool typically reserved for internal engineers to swap out the “brains” of the AI in real-time. While the interface remains unpolished and hidden behind a server-side flag, the implications for the upcoming Google I/O 2026 are significant.

Currently, the public version of Gemini Live relies on a single engine: Gemini 3.1 Flash Live. This model is a “native input” system, meaning it is designed to process raw audio and video streams directly. However, the presence of seven alternative models suggests that Google is moving away from a one-size-fits-all approach, preparing instead for a tiered system of voice intelligence that could vary based on the complexity of the task or the subscription level of the user.

Decoding the ‘A2A’ Architecture

To a casual observer, the strings of text in the hidden menu—like “A2A_Rev25_RC2″—look like alphabet soup. To a developer, they tell a specific story. The “A2A” prefix almost certainly stands for Audio-to-Audio. This is a critical distinction in the AI world. Traditional voice assistants typically use a three-step pipeline: Speech-to-Text (STT), processing the text via a Large Language Model (LLM), and then converting the response back via Text-to-Speech (TTS).

Decoding the 'A2A' Architecture
Thinking

A native A2A model skips the middleman. By processing audio directly, the AI can perceive nuance, tone, and emotion in a human voice and respond with similar fluidity, drastically reducing latency and making the conversation feel less like a series of commands and more like a natural dialogue. The “RC2” designation stands for “Release Candidate 2,” a technical milestone indicating that these models are in the final stages of testing and are nearing production readiness.

Reasoning, Personalization, and Codenames

Among the seven revealed models, two stand out for their potential impact on the user experience: the “Thinking” variant and the “P13n” model. The “A2A_Rev25_RC2_Thinking” model suggests that Google is bringing the same “reasoning” capabilities found in its high-end text models to the voice interface. In practice, this would mean a user could ask a complex, multi-step question and the AI would “pause” to reason through the logic before speaking, trading a snappy response for a more accurate, thoughtful one.

Reasoning, Personalization, and Codenames
Gemini Live Text

Then there is “A2A_Rev23_P13n.” In developer shorthand, “P13n” is a common abbreviation for “personalization.” This suggests a model specifically tuned to remember user preferences, behavioral patterns, and personal context without needing to reference a separate database for every turn of the conversation. It points toward a future where Gemini Live doesn’t just know the facts, but knows you.

Reasoning, Personalization, and Codenames
Reasoning, Personalization, and Codenames

The most mysterious entries are “Capybara” and “Nitrogen.” Neither name appears in any known Google documentation. The existence of both a stable “Capybara” and an “Exp” (experimental) version indicates that Google is iterating on a completely new architectural approach for these specific models, though their exact purpose remains unknown.

Model Codename Technical Status Likely Primary Function
A2A_Rev25_RC2_Thinking Release Candidate 2 Enhanced reasoning and complex problem solving
A2A_Rev23_P13n Internal Test Deep user personalization and behavioral adaptation
A2A_Capybara / Exp Experimental Next-generation audio architecture testing
A2A_Nitrogen_Rev23 Internal Test Performance or efficiency optimization

Why a Model Picker Matters

For the average user, the ability to switch models might seem unnecessary. However, it solves a fundamental tension in AI development: the trade-off between speed and intelligence. A “Flash” model is nearly instantaneous but can hallucinate or miss nuance. A “Thinking” model is profound but slower.

Why a Model Picker Matters
Gemini Live Thinking

By building the infrastructure for a model selector, Google is positioning itself to offer a “Pro” voice experience. This could manifest as a subscription tier where users pay for the “Thinking” or “Personalized” models, or a toggle in the settings that allows users to choose between “Fast” and “Thorough” modes depending on whether they are setting a timer or debating philosophy.

While testing confirms that the selected model name is transmitted to Google’s servers at the start of a session, it is not yet clear if the functional differences are perceptible in the current build. The interface remains unpolished, suggesting this is currently an internal diagnostic tool rather than a consumer-facing feature.

All eyes now turn to Google I/O 2026. If the “RC2” timeline holds, it is highly probable that we will see these models move from hidden code to public announcements later this month. Whether “Capybara” becomes a household name or remains a secret in the codebase, the shift toward native, multi-model audio intelligence is clearly underway.

Do you prefer a voice assistant that is instant and snappy, or one that takes a moment to think before it speaks? Let us know in the comments or share this story with your fellow tech enthusiasts.

You may also like

Leave a Comment