ViqusViqus
Navigate
Company
Blog
About Us
Contact
System Status
Enter Viqus Hub

Thinking Machines Unveils Full-Duplex AI Model for Real-Time Human Interaction

AI interactions full-duplex communication multimodal AI TML-Interaction-Small low latency natural language processing
May 12, 2026
Viqus Verdict Logo Viqus Verdict Logo 8
A Leap to Collaboration Mode
Media Hype 6/10
Real Impact 8/10

Article Summary

Thinking Machines, founded by former OpenAI CTO Mira Murati, announced a significant research preview for its 'Interaction Models,' designed to overcome the latency and turn-taking limitations of current generative AI systems. These models use a new 'full-duplex' architecture, allowing the AI to simultaneously listen, see, and talk by processing inputs and outputs in rapid, 200-millisecond chunks. The system features a core component, TML-Interaction-Small, which manages dialogue while an asynchronous Background Model handles complex reasoning, web searches, and tool calls in parallel. By utilizing 'encoder-free early fusion,' the architecture minimizes latency, achieving superior real-time performance on benchmarks compared to leading competitors. The most profound implications are seen in high-stakes enterprise applications, such as real-time monitoring in manufacturing or medical settings, and improving the perceived naturalness of virtual customer service.

Key Points

  • The new 'Interaction Models' achieve full-duplex communication, eliminating the noticeable pauses that currently interrupt natural human-AI dialogue.
  • The dual-model architecture separates immediate dialogue management (TML-Interaction-Small) from heavy-lifting reasoning (Background Model) for seamless, low-latency performance.
  • The low latency and real-time capability unlock critical enterprise use cases, including live safety monitoring and sophisticated time-sensitive data management in industrial settings.

Why It Matters

This breakthrough shifts the focus of generative AI from merely generating correct text to achieving natural, fluid, real-time collaboration. While mere speed improvements are common, the architectural solution presented—decoupling dialogue flow from deep reasoning—solves a fundamental human-computer interaction problem that has limited enterprise adoption. For high-stakes verticals like medicine or industrial automation, where milliseconds count, this level of low-latency responsiveness represents a significant competitive and operational advantage, accelerating the journey of AI from a novelty chatbot to a genuine collaborator.

You might also be interested in