Anyone who has spent significant time talking to a modern AI assistant knows the “voice lag.” It is that subtle, awkward pause—the digital equivalent of a person blinking slowly while trying to remember where they left off—that reminds you that you are interacting with a machine, not a human. You speak, you wait for the model to process and then it responds. If you try to interrupt it, the system often stutters or ignores you until it finishes its pre-planned sentence.
Thinking Machines Lab, the AI startup founded last year by former OpenAI CTO Mira Murati, is attempting to kill that lag. On Monday, the company announced the development of “interaction models,” a new architectural approach designed to allow AI to listen and speak simultaneously. The goal is to move away from the turn-based nature of current AI and toward something that feels less like a series of text messages read aloud and more like a natural phone call.
At the heart of this announcement is a concept called “full duplex” communication. In traditional AI voice interfaces, the system operates in half-duplex: it is either in “listen mode” or “speak mode.” Thinking Machines is building a model that processes input and generates output at the same time. This means the AI doesn’t just wait for you to stop talking to start thinking; it is constantly monitoring the conversation, allowing it to be interrupted or to react to a change in the user’s tone or pace in real-time.
For those of us who spent years in software engineering before moving into reporting, this is a critical distinction. Most current “voice” AI is actually a pipeline of three separate models: an automatic speech recognition (ASR) system to turn voice to text, a large language model (LLM) to process that text, and a text-to-speech (TTS) engine to voice the answer. Each step adds latency. By making interactivity native to the model itself, Thinking Machines aims to collapse that pipeline.
The race for the 400-millisecond threshold
The primary metric Thinking Machines is touting is speed. The company claims its latest model, TML-Interaction-Small, responds in 0.40 seconds. To the average user, four-tenths of a second might seem instantaneous, but in the world of human linguistics, it is the “goldilocks zone” for conversation. Natural human conversation typically features gaps of about 200 to 400 milliseconds between turns.

When a model takes a full second or two to respond, the human brain registers a delay, which triggers a cognitive shift—we stop feeling like we are in a flow and start feeling like we are operating a tool. By hitting the 0.40-second mark, Thinking Machines is positioning itself to bypass the “uncanny valley” of voice interaction, claiming speeds that are significantly faster than comparable offerings from industry giants like Google and OpenAI.
| Feature | Standard Voice AI (Half-Duplex) | Interaction Models (Full Duplex) |
|---|---|---|
| Communication Flow | Turn-based (Listen $rightarrow$ Process $rightarrow$ Speak) | Simultaneous (Listen & Speak) |
| Interruption Handling | Often delayed or requires “wake word” | Native, real-time interruption |
| Architecture | Pipeline (ASR $rightarrow$ LLM $rightarrow$ TTS) | Integrated/Native Interactivity |
| User Experience | Walkie-talkie style | Phone call style |
Native intelligence vs. Bolted-on features
“interruptibility” is not a new feature. OpenAI’s GPT-4o and Google’s Gemini Live both offer versions of voice interaction that allow users to cut in. However, these are often implemented as a layer of “voice activity detection” (VAD) bolted onto the top of the model. The system is essentially listening for a specific sound threshold to trigger a “stop” command to the TTS engine.

Thinking Machines is arguing that interactivity should be native. In their view, the ability to listen while talking isn’t a feature you add to a model; it’s a fundamental way the model should process information. If the AI is natively aware of the audio stream while it is generating its own, it can adjust its inflection, speed, or content based on the user’s immediate reaction—such as a sigh of confusion or a quick “yes, exactly”—without having to fully stop and restart its processing cycle.
This shift could have massive implications for accessibility and specialized use cases. Imagine an AI tutor that can tell when a student is hesitating and offer a hint before the student even asks, or a cybersecurity assistant that can react instantly to a technician’s verbal correction during a high-pressure system failure.
The road from research to release
Despite the impressive benchmarks, Thinking Machines is tempering expectations by clarifying that this is a research preview, not a consumer product. The company is not releasing the model to the general public immediately. Instead, they have outlined a phased rollout strategy:
- Current Stage: Internal research and benchmark validation.
- Next Phase: A “limited research preview” expected to launch in the coming months for a select group of testers.
- Final Goal: A wider release scheduled for later this year.
The primary unknown remains the “real-world” experience. Benchmarks in a controlled environment rarely capture the chaos of human speech—the stutters, the background noise, and the overlapping conversations that define real interaction. Whether TML-Interaction-Small can distinguish between a meaningful interruption and a dog barking in the background will be the true test of its “native” intelligence.

As a former engineer, I tend to be skeptical of “faster” claims until I can feel the latency myself. The technical ambition here is high, and given Mira Murati’s track record in scaling the most influential models of the last three years, the industry is paying close attention. If Thinking Machines can deliver on the promise of a truly fluid, full-duplex conversation, the “voice assistant” as we know it—a polite but rigid tool—may finally evolve into a genuine conversational partner.
The next confirmed milestone for the project is the launch of the limited research preview, which the company expects to deploy within the next few months. Further technical documentation and expanded benchmarks are expected to accompany that release on the Thinking Machines official blog.
Do you think native interruption will make AI feel more human, or just more intrusive? Let us know in the comments or share this story on social media.
