ByteDance releases full-duplex speech large model Seeduplex, AI voice interaction enters the era of "listening and speaking simultaneously"

robot
Abstract generation in progress

AIMPACT News, April 9th, ByteDance’s Seed team released the native full-duplex speech large model Seeduplex, which has been fully launched on Doubao App, marking the upgrade of voice interaction from “turn-based” to real-time natural conversation.


Seeduplex achieves “listening and speaking” synchronized processing through joint modeling of speech and semantics, significantly improving anti-interference performance in complex environments. Data shows that compared to traditional half-duplex solutions, its false reply rate and false interruption rate have decreased by about 50%.


In terms of interaction experience, the model introduces dynamic stopping judgment technology, reducing response latency by approximately 250 milliseconds, decreasing talk-over incidents by 40%, and more accurately distinguishing user pauses from conversation endings. At the same time, through speculative sampling and quantization optimization, the system maintains low latency and smoothness even in high concurrency scenarios, with overall call satisfaction increasing by about 8.34%.


This upgrade signifies that AI voice is evolving towards “real-time, multimodal, human-like interaction,” and in the future, it is expected to incorporate visual capabilities, promoting intelligent assistants toward an integrated development of “listening, seeing, thinking, and speaking.” (Source: ByteDance)



View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin