Former OpenAI CTO challenges his old employer: new model responds in 200ms, latency surpasses GPT-Realtime

According to Beating Monitoring, the Thinking Machines laboratory founded by former OpenAI CTO Mira Murati released a preview of their “Interactive Model” research. The new system abandons the traditional approach of stitching together speech and text with external tools, instead natively handling real-time audio and video interactions. The model can continuously receive information with a 200ms “micro-turn,” enabling it to listen, watch, and speak simultaneously, and supports users interrupting in real-time.

The first showcased model, TML-Interaction-Small, uses a 276 billion parameter MoE architecture, activating 12 billion parameters each time. To address the flaw of traditional large models “stopping perception when generating answers,” the development team split the system into front-end and back-end: the front-end model maintains uninterrupted conversations, while the back-end model handles complex reasoning, web searches, or UI generation, seamlessly streaming results back to the front-end.

This architecture directly outperforms competitors in response speed. Official data shows its speech turn-around delay is only 0.40 seconds, achieving a score of 77.8 on FD-bench V1.5, with both core metrics surpassing GPT-realtime-2.0 and Gemini 3.1 Flash Live. However, continuous processing of audio and video quickly consumes context capacity, and low-latency performance is highly dependent on network conditions. Thinking Machines plans to open limited previews in the coming months.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin