ME News reports that xAI has officially launched two standalone audio APIs: Grok STT and Grok TTS. Grok STT features high accuracy, low latency, supports REST batch processing, WebSocket real-time transcription, with word-level timestamps, speaker separation, multi-channel, and intelligent reverse text normalization, covering over 25 languages; batch pricing is $0.10 per hour, streaming is $0.20 per hour, with word error rates better than many competitors. Grok TTS offers fast, natural, and fine-grained controllable speech through tags, priced at $4.20 per million characters. Both are based on the same technology stack, used by Grok Voice, Tesla, and Starlink.

MeNews

2026-05-25 20:56:03

Abstract generation in progress

ME News Report, April 18 (UTC+8), xAI recently announced the official launch of two standalone audio APIs: Grok Speech-to-Text (STT) and Grok Text-to-Speech (TTS). Grok STT offers high-accuracy, low-latency transcription services, supporting batch processing via REST API and real-time streaming via WebSocket API, with features including word-level timestamps, speaker separation, multi-channel support, and intelligent reverse text normalization. The article mentions that in benchmark tests across multiple fields such as phone calls, meetings, videos/podcasts, its word error rate outperforms mainstream commercial models like ElevenLabs, Deepgram, and AssemblyAI. The service supports over 25 languages, priced at $0.10 per hour for batch processing and $0.20 per hour for streaming. Grok TTS can generate fast, natural, and expressive speech, supporting fine-grained control through simple voice tags, priced at $4.20 per 1 million characters. Both APIs are built on the same technology stack that powers Grok Voice, Tesla vehicle support, and Starlink customer support. (Source: InFoQ)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

6 Likes

Reward
6
5
1
Share

Comment

Add a comment

GateUser-99725296

· 12h ago

Can this price volatility be sustained? Let's see how Azure and AWS respond.

View OriginalReply0

0xLateBreakfast

· 12h ago

Batch and streaming are double the price; the business needs to do a good job of calculating the costs.

View OriginalReply0

RiskOffRina

· 13h ago

Multi-channel support is crucial for conference transcription; finally, no need to align manually.

View OriginalReply0

WalletHealthInspector

· 13h ago

Grok Voice uses these two APIs, right? Finally separated out.

View OriginalReply0

MosaicBowtieRealm

· 13h ago

What specific parameters can fine-grained control tags manage? Is it speech speed, pitch, or emotion?

View OriginalReply0

Trending Topics
View More
#
StockTradingChallengeUpTo17000U
16.19M Popularity
#
USStrikesIran
9.3M Popularity
#
GatePredictionMarketAddsSmartMoneyTracking
14.35M Popularity
#
InstitutionalCapitalRotatesFromBTCToHYPEAndXRP
14.33M Popularity
#
TradeCFDWinGold
3.07M Popularity

Pinned

Sitemap

xAI launches Grok speech-to-text and text-to-speech API

Trending Topics

StockTradingChallengeUpTo17000U

USStrikesIran

GatePredictionMarketAddsSmartMoneyTracking

InstitutionalCapitalRotatesFromBTCToHYPEAndXRP

TradeCFDWinGold

Pinned