ME News message, April 18 (UTC+8). According to Beating monitoring, xAI has launched two standalone audio APIs: Grok Speech to Text and Grok Text to Speech. Both come from the same audio stack that supports Grok Voice, Tesla’s in-car system, and Starlink customer service. This time, they are opened as independent endpoints, allowing developers to directly integrate into applications such as voice agents, real-time transcription, accessibility tools, and podcasts.

STT provides two modes. The REST API is for batch transcription of large audio files, returning results at millisecond-level latency; the WebSocket API is designed for real-time audio streams. Included capabilities include word-level timestamps, speaker separation (diarization), multi-channel recognition, and Inverse Text Normalization, which automatically transforms spoken numbers, dates, and currencies into standardized structured text. Language coverage exceeds 25, with seamless switching during conversations.

xAI also released a set of word error rate (WER, lower is better) comparisons. In overall scenarios: Grok 6.9%, ElevenLabs 9.0%, Deepgram 11.0%, AssemblyAI 12.9%. The gap is even larger in “telephone call entity recognition”: Grok 5.0%, while the corresponding figures for the other three are 12.0%, 13.5%, and 21.3% respectively. In common business scenarios such as meetings, video podcasts, and phone calls, Grok also maintains a slight lead. These figures are published based on xAI’s own testing, and there has been no third-party re-verification yet.

In pricing, batch STT is $0.10 per hour, and streaming STT is $0.20 per hour; TTS is $4.20 per 1 million characters. TTS supports inline Speech Tags to control emotion and prosody, such as \[laugh\], \[sigh\], \[whisper\], and others.

(Source: BlockBeats)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

12 Likes

Reward
12
13
1
Share

Comment

Add a comment

GateUser-0f33f9ef

· 8h ago

What is reverse text normalization, a cutting-edge technology? Can someone knowledgeable explain it in detail?

View OriginalReply0

WhitepaperByTheRoadside

· 15h ago

Word-level timestamps + speaker separation, meeting transcription scenarios are going crazy.

View OriginalReply0

Lime-ColoredStop-LossLine

· 05-27 13:11

Batch processing $0.1 per hour is really attractive, but with streaming double pricing, it clearly forces you to go for bulk.

View OriginalReply0

GateUser-83a2dd8a

· 05-27 13:07

25+ language coverage, how is the Chinese performance? Has anyone tested it?

View OriginalReply0

TheProphetOfToast

· 05-27 11:44

Emotional rhythm inline tags, finally no need to listen to robots read scripts anymore

View OriginalReply0

GateUser-b665e41c

· 05-27 10:48

Tesla's in-car system integration, what is the maximum latency in milliseconds for voice interaction while driving?

View OriginalReply0

PunkRiskMgr

· 05-27 10:40

Starlink customer service is now in use, and rural area accents have become a rich training data resource for speech recognition.

View OriginalReply0

ToBeHonest,You'llLose

· 05-27 10:36

From LLMs to speech, the multimodal war officially enters the second half

View OriginalReply0

HashbrownHero

· 05-27 10:35

With this bulk transcription price, subtitle groups and podcast hosts will probably need to migrate collectively.

View OriginalReply0

GateUser-bee672a5

· 05-27 10:35

Waiting for an open-source community to reproduce WER, xAI's benchmark usually starts by questioning.

View OriginalReply0

Trending Topics
View More
#
WinGoldBarsWithGrowthPoints
1.1M Popularity
#
StockTradingChallengeUpTo17000U
136.78K Popularity
#
USLaunchesNewStrikesOnIranOilRebounds
9.32M Popularity
#
TradeCFDWinGold
3.1M Popularity
#
DailyPolymarketHotspot
459.87K Popularity

Pinned

Sitemap

xAI opens Grok STT and TTS audio APIs, with overall word error rate of STT reduced to 6.9%

Trending Topics

WinGoldBarsWithGrowthPoints

StockTradingChallengeUpTo17000U

USLaunchesNewStrikesOnIranOilRebounds

TradeCFDWinGold

DailyPolymarketHotspot

Pinned