Sakana AI launches the KAME system, achieving deeper knowledge injection with near-zero latency

robot
Abstract generation in progress

AIMPACT News, May 3rd (UTC+8), Sakana AI launched the hybrid architecture KAME, which can inject backend LLM knowledge in real-time while maintaining near-zero latency. The system consists of two asynchronous components running in parallel: the front end, based on the Moshi architecture’s S2S module, processes audio approximately every 80 milliseconds and immediately generates responses; the backend comprises an STT component and a full LLM, continuously building partial transcriptions and generating oracle streams that are streamed back to the front end, allowing mid-response corrections when better oracle data arrives. Evaluation shows that Moshi alone scored 2.05, KAME+gpt-4.1 scored 6.43, KAME+claude-opus-4-1 scored 6.23, with latency comparable to Moshi; leading systems like Unmute scored 7.70 but with a latency of 2.1 seconds. The KAME backend is independent, supporting switching LLMs during inference without retraining.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin