CoinWorld News, Jieyue Star releases the new generation automatic speech recognition model StepAudio 2.5 ASR, which is now fully available on its open platform. This version first introduces large language model multi-token prediction (MTP) technology into the speech recognition field, significantly increasing inference speed while reusing the large model's 32K context window, breaking the traditional limitation of segmenting and splicing long audio transcriptions. The new model directly reuses the 32K context window, supporting end-to-end single-pass input of up to 30 minutes of complete audio. In a 30-minute full-load input test, the model showed no accuracy degradation over time, and its overall error rates on ten authoritative open-source test sets, including Librispeech in both Chinese and English, are lower than competing products.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin