ME News reports that xAI has launched two standalone audio APIs: Grok STT and Grok TTS, both originating from the same audio stack, supporting Grok Voice, Tesla in-car systems, and Starlink customer service, among others. STT offers REST batch transcription and WebSocket real-time streaming, with word-level timestamps, speaker separation, multi-channel support, and inverse text normalization, covering over 25 languages; TTS supports inline tags for emotion and prosody. They also announced WER comparisons, with Grok leading in multiple scenarios, but no third-party re-evaluation has been conducted yet. Pricing: STT batch processing $0.10 per hour, streaming $0.20 per hour, TTS $4.20 per million characters.

MeNews

2026-05-27 05:49:03

Abstract generation in progress

ME News, April 18 (UTC+8), according to Beating Monitoring, xAI has launched two independent audio APIs: Grok Speech to Text and Grok Text to Speech. Both come from the same audio stack that supports Grok Voice, Tesla’s in-car system, and Starlink customer service. This time, they are opened up in the form of standalone endpoints, enabling developers to directly integrate applications such as voice agents, real-time transcription, accessibility tools, and podcasts.

STT offers two modes. The REST API is for batch transcription of large audio files, with millisecond-level return; the WebSocket API is designed for real-time audio streams. Included capabilities cover word-level timestamps, speaker separation (diarization), multi-channel recognition performed separately for each channel, and Inverse Text Normalization, which automatically reshapes spoken numbers, dates, and currencies from conversational speech into standardized structured text. The language coverage spans 25+ languages and allows seamless switching within conversations.

xAI also released a set of word error rate (WER) comparisons (the lower the better): overall scenario—Grok 6.9%, ElevenLabs 9.0%, Deepgram 11.0%, AssemblyAI 12.9%. The gap is even larger for “telephone call entity recognition”: Grok 5.0%, with the corresponding three at 12.0%, 13.5%, and 21.3%, respectively. In common business scenarios such as meetings, video podcasts, and phone calls, Grok is also slightly ahead across the board. These figures are published by xAI based on its own testing, and there has not yet been any third-party re-verification.

On pricing: STT batch processing is $0.10 per hour, streaming is $0.20 per hour; TTS is $4.20 per 1 million characters. TTS supports inline Speech Tags to control emotion and intonation, such as \[laugh], \[sigh], \[whisper], and \[.] (Source: BlockBeats)

XAI1.72%

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

7 Likes

Reward
7
7
Repost
Share

Comment

Add a comment

GateUser-7919e6b9

· 1h ago

STT in bulk costs only $0.10/hour, which is cheaper than the Whisper API.

View OriginalReply0

GateUser-28f37882

· 3h ago

The same stack feeds Grok Voice, in-car systems, Starlink, and this wave of resource integration with xAI has some substance.

View OriginalReply0

Don'tMessWithSlippage.

· 3h ago

Grok's audio stack is finally open to the public, Tesla owners are ecstatic

View OriginalReply0

ReflectiveChainShadow

· 3h ago

WebSocket real-time streaming costs $0.2 per hour—can it work smoothly in live captioning scenarios?

View OriginalReply0

MossyLedger

· 3h ago

WER comparison without third-party re-testing, let the bullets fly for a while first.

View OriginalReply0

MistBlueLily

· 3h ago

Reverse text normalization is so useful for building voice assistants; I finally don't have to write rules myself.

View OriginalReply0

NodeUnderTheAurora

· 3h ago

4.2 “per million characters” TTS—cheaper or more expensive than ElevenLabs? Has anyone figured it out?

View OriginalReply0

Trending Topics
View More
#
StockTradingChallengeUpTo17000U
16.23M Popularity
#
TrumpBacksCFTCAuthorityOverPredictionMarkets
819.86K Popularity
#
GatePredictionMarketAddsSmartMoneyTracking
13.21M Popularity
#
MicronMarketCapBreaks1Trillion
38.04K Popularity
#
TradeCFDWinGold
3.08M Popularity

Pinned

Sitemap

xAI opens Grok STT and TTS audio APIs, with overall word error rate of STT reduced to 6.9%

Trending Topics

StockTradingChallengeUpTo17000U

TrumpBacksCFTCAuthorityOverPredictionMarkets

GatePredictionMarketAddsSmartMoneyTracking

MicronMarketCapBreaks1Trillion

TradeCFDWinGold

Pinned