xAI's audio API pricing is quite interesting, with batch STT costing only $0.1 per hour, while TTS is actually more expensive at $4.2 per million characters. Is this premium for emotional label values?

View Original
MeNews
xAI opens Grok STT and TTS audio APIs, with overall word error rate of STT reduced to 6.9%
ME News reports that xAI has launched two standalone audio APIs: Grok STT and Grok TTS, originating from the same audio stack, supporting Grok Voice, Tesla in-car systems, and Starlink customer service, among others. STT offers REST batch transcription and WebSocket real-time streaming, with word-level timestamps, speaker separation, multi-channel support, and inverse text normalization, covering over 25 languages; TTS supports inline tags for emotion and prosody. They also published WER comparisons, with Grok leading in multiple scenarios, but no third-party re-evaluation has been conducted yet. Pricing: STT batch processing at $0.10 per hour, streaming at $0.20 per hour, TTS at $4.20 per million characters.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned