ME News reports that xAI has launched two standalone audio APIs: Grok STT and Grok TTS, originating from the same audio stack, supporting Grok Voice, Tesla in-car systems, and Starlink customer service, among others. STT offers REST batch transcription and WebSocket real-time streaming, with word-level timestamps, speaker separation, multi-channel support, and inverse text normalization, covering over 25 languages; TTS supports inline tags for emotion and prosody. They also published WER comparisons, with Grok leading in multiple scenarios, but no third-party re-evaluation has been conducted yet. Pricing: STT batch processing at $0.10 per hour, streaming at $0.20 per hour, TTS at $4.20 per million characters.

MeNews

2026-05-26 17:46:33

Abstract generation in progress

ME News, April 18 (UTC+8): According to Beating monitoring, xAI has launched two independent audio APIs—Grok Speech to Text and Grok Text to Speech. Both come from the same audio stack that supports Grok Voice, Tesla’s in-car system, and Starlink customer service. This time, they are opened as standalone endpoints, enabling developers to directly integrate applications such as voice agents, real-time transcription, accessibility tools, and podcasts.

STT offers two modes. The REST API is used for batch transcription of large audio files, returning results at millisecond-level latency; the WebSocket API is designed for real-time audio streams. Included capabilities feature word-level timestamps, speaker diarization, multi-channel recognition with each channel recognized separately, and Inverse Text Normalization, which automatically formats spoken numbers, dates, and currencies into standardized structured text. The system supports more than 25 languages and can switch seamlessly during conversations.

xAI also released a set of word error rate (WER) comparisons (lower is better): overall in-scene performance—Grok 6.9%, ElevenLabs 9.0%, Deepgram 11.0%, AssemblyAI 12.9%. The gap is even larger for “telephone call entity recognition”: Grok 5.0%, while the corresponding figures for the other three are 12.0%, 13.5%, and 21.3%, respectively. In common business scenarios such as meetings, video podcasts, and phone calls, Grok also holds a slight lead.

These figures are self-tested and published by xAI, and there has been no third-party re-testing yet. In terms of pricing, STT batch processing is $0.10 per hour, and streaming is $0.20 per hour; TTS is $4.20 per 1,000,000 characters. TTS supports inline Speech Tags to control emotion and intonation, for example [laugh], [sigh], and [whisper]. (Source: BlockBeats)

XAI-1.49%

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

6 Likes

Reward
6
4
2
Share

Comment

Add a comment

TwoFactorZen

· 7h ago

WebSocket real-time stream 0.2 (刀)/hour—brothers making live subtitles can do the math.

View OriginalReply0

Frost-ColoredCubeCity

· 10h ago

The batch pricing is okay, but the streaming double pricing strategy clearly forces you to go for bulk, it's an old trick.

View OriginalReply0

GateUser-517aed04

· 10h ago

The same stack fed to Tesla's car system + Starlink customer service, Musk is really good at playing this closed loop.

View OriginalReply0

GateUser-b6d80ba0

· 10h ago

WER talks big and acts on its own, moving ahead without waiting for third-party re-testing—old crypto hands know what that means.

View OriginalReply0

Trending Topics
View More
#
StockTradingChallengeUpTo17000U
16.22M Popularity
#
TrumpBacksCFTCAuthorityOverPredictionMarkets
816.27K Popularity
#
GatePredictionMarketAddsSmartMoneyTracking
13.2M Popularity
#
MicronMarketCapBreaks1Trillion
36.12K Popularity
#
TradeCFDWinGold
3.08M Popularity

Pinned

Sitemap

xAI opens Grok STT and TTS audio APIs, with overall word error rate of STT reduced to 6.9%

Trending Topics

StockTradingChallengeUpTo17000U

TrumpBacksCFTCAuthorityOverPredictionMarkets

GatePredictionMarketAddsSmartMoneyTracking

MicronMarketCapBreaks1Trillion

TradeCFDWinGold

Pinned