Grok Voice Suite integrated into the API, Tesla vehicle systems and Starlink customer service have already been using it, WER self-test data looks good but let's wait for third-party re-evaluation.

View Original
MeNews
xAI opens Grok STT and TTS audio APIs, with overall word error rate of STT reduced to 6.9%
According to ME News, xAI has launched two standalone audio APIs: Grok STT and Grok TTS, both originating from the same audio stack, supporting Grok Voice, Tesla in-car systems, and Starlink customer service, among others. STT offers REST batch transcription and WebSocket real-time streaming, with word-level timestamps, speaker separation, multi-channel support, and inverse text normalization, covering over 25 languages; TTS supports inline tags for emotion and prosody. They also announced WER comparisons, with Grok leading in multiple scenarios, and no third-party re-evaluation available yet. Pricing: STT batch processing at $0.10 per hour, streaming at $0.20 per hour, TTS at $4.20 per million characters.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned