Grok STT's word-level timestamps and speaker separation are essential for podcast editing teams, but WER data is self-reported, waiting for third-party re-evaluation.

GROK-8.6%
STT0.46%
View Original
MeNews
xAI opens Grok STT and TTS audio APIs, with overall word error rate of STT reduced to 6.9%
According to ME News reports, xAI has launched two standalone audio APIs: Grok STT and Grok TTS, originating from the same audio stack, supporting Grok Voice, Tesla in-car systems, and Starlink customer service, among others. STT offers REST batch transcription and WebSocket real-time streaming, with word-level timestamps, speaker separation, multi-channel support, and inverse text normalization, covering over 25 languages; TTS supports inline tags for emotion and prosody. They also announced WER comparisons, with Grok leading in multiple scenarios, and no third-party re-evaluation available yet. Pricing: STT batch processing at $0.10 per hour, streaming at $0.20 per hour, TTS at $4.20 per million characters.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 1
  • Repost
  • Share
Comment
Add a comment
Add a comment
GateUser-dd0c6b87
· 12h ago
It's not the same thing at all, don't fool people.
View OriginalReply0