xAI's audio combination punch is quite fierce, with STT streaming pricing at $0.20 per hour directly competing with Whisper, and TTS also includes emotional tags. Is Elon trying to turn the speech track into a red ocean?

XAI1.4%
STT23.52%
View Original
MeNews
xAI opens Grok STT and TTS audio APIs, with overall word error rate of STT reduced to 6.9%
ME News reports that xAI has launched two standalone audio APIs: Grok STT and Grok TTS, both originating from the same audio stack, supporting Grok Voice, Tesla in-car systems, and Starlink customer service, among others. STT offers REST batch transcription and WebSocket real-time streaming, with word-level timestamps, speaker separation, multi-channel support, and inverse text normalization, covering over 25 languages; TTS supports inline tags for emotion and prosody. They also announced WER comparisons, with Grok leading in multiple scenarios, but no third-party re-evaluation has been conducted yet. Pricing: STT batch processing $0.10 per hour, streaming $0.20 per hour; TTS $4.20 per million characters.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned