STT word error rate is lower than competitors, and TTS can still fine-tune tone with tags. The same tech stack is used for Grok Voice, Tesla, and Starlink—Elon Musk is aiming to master the entire speech interaction pipeline from upstream to downstream.

View Original
MeNews
xAI launches Grok speech-to-text and text-to-speech API
ME News reports that xAI has officially launched two standalone audio APIs: Grok STT and Grok TTS. Grok STT features high accuracy, low latency, supports REST batch processing, WebSocket real-time transcription, with word-level timestamps, speaker separation, multi-channel, and intelligent reverse text normalization, covering over 25 languages; batch pricing is $0.10 per hour, streaming is $0.20 per hour, with word error rates better than many competitors. Grok TTS offers fast, natural, and fine-grained controllable speech through tags, priced at $4.20 per million characters. Both are based on the same technology stack, used by Grok Voice, Tesla, and Starlink.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned