Whisper and Gemini 3 Pro are outperformed by nearly 30% in complex acoustic environments? This round, Qwen3-ASR 1.7B base model Mega-ASR has some substance; hallucination and word loss issues are finally being taken seriously.

View Original
MeNews
National University of Singapore and Nanyang Technological University have open-sourced Mega-ASR, reducing hallucinations and word omissions in ASR under extreme noise conditions.
ME News Report, May 22 (UTC+8), according to Beating Monitoring, teams from the National University of Singapore, Nanyang Technological University, and Shanghai Artificial Intelligence Laboratory jointly open-sourced the first all-scenario robust speech recognition base model Mega-ASR, aiming to address issues such as hallucinations, word omissions, and blank outputs in real-world speech recognition. The model is powered by Qwen3-ASR 1.7B as the underlying engine, achieving nearly 30% performance improvement over models like Whisper, Gemini 3 Pro, and Seed-ASR in extremely complex acoustic environments. Currently, the project has been open-sourced on GitHub, with all code and model weights released under the Apache-2.0 license. The research team built a dataset containing 2.4 million samples, totaling 11k hours.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned