Xiaomi releases the MiMo-V2.5-TTS series, which is free during public testing via the MiMo open platform API. Three models: V2.5-TTS with multiple voices and singing capabilities; V2.5-TTS-VoiceDesign that can generate new voices (including age, gender, accent, and other dimensions) with just a natural language prompt; V2.5-TTS-VoiceClone that clones voice using a few seconds of reference audio, preserving breath, rhythm, and pauses, with no training required. All three can be controlled with natural language commands or audio tags to adjust emotion and style, supporting Chinese, English, and various regional dialects, with an output sample rate of 24,000 Hz, and streaming output recommended in PCM16.

MeNews

2026-04-24 04:20:17

Abstract generation in progress

ME News Report, April 24 (UTC+8), according to Beating Monitoring, Xiaomi released the MiMo-V2.5-TTS series of speech synthesis models, providing services through the MiMo open platform API, with free limited-time testing during the public beta. The series includes three models, each targeting different scenarios. MiMo-V2.5-TTS has built-in high-quality voices, supports singing mode, and can accurately express pitch and rhythm. MiMo-V2.5-TTS-VoiceDesign supports generating entirely new voices from a natural language description, without reference audio, allowing definitions based on age, gender, accent, temperament, and other dimensions. MiMo-V2.5-TTS-VoiceClone clones voices by providing a few seconds of reference audio to replicate the target speaker’s voice, preserving breath, rhythm, and pauses, without training or fine-tuning. All three models support controlling speech style via natural language commands, such as adjusting emotion with descriptions like “gentle but tired” or “tender in agitation,” and also support precise control with audio tags (e.g., “inhale,” “laughter,” “choking”). Language support includes Chinese, English, as well as dialects like Northeastern, Sichuan, Henan, Cantonese, and others. Audio output sampling rate is 24,000 Hz, with streaming output recommended in PCM16 format. (Source: BlockBeats)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
WCTCTradingKingPK
142.18K Popularity
#
CryptoMarketSeesVolatility
210.67K Popularity
#
rsETHAttackUpdate
63.22K Popularity
#
US-IranTalksStall
165.35K Popularity
#
ETHMemeCoinFLORKSurges
33.25K Popularity

Sitemap

Xiaomi MiMo-V2.5-TTS Open API: supports singing, natural language emotional modulation, and audio cloning of voice in seconds

Trending Topics

WCTCTradingKingPK

CryptoMarketSeesVolatility

rsETHAttackUpdate

US-IranTalksStall

ETHMemeCoinFLORKSurges

Pin