Elon Musk: Grok V9 has a huge gap compared to V8, and the V9 training version already shows better performance.

robot
Abstract generation in progress
AIMPACT News, May 15 (UTC+8), Elon Musk posted on the X platform that the latest completed Grok V9 (1.5T parameters) training run "performed very well," and this result has not yet been included in the supplementary training part of Cursor data. The current internally developed base model version is V9, with approximately 1.5 trillion parameters, significantly improved over V8 in data cleaning, training methods, and model scale, and optimized for the Blackwell architecture to enhance computational efficiency. Musk emphasized that, in comparison, the current external version v4.2 is built on the V8 base model, with about 0.5T parameters, running on the Hopper architecture, and still has certain limitations in training data quality and coverage. The performance gap between Grok V8 and V9 is huge, with the new generation model achieving a leap in overall capabilities. (Source: ODAILY)
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 10
  • 1
  • Share
Comment
Add a comment
Add a comment
GateUser-dcb4d0d5
· 17h ago
Has the V8 user become a victim? Just after getting familiar, they're already behind the previous generation.
View OriginalReply0
SlippagePoet
· 18h ago
The training cost for 1.5 trillion parameters is unthinkable; xAI's burning money rate is even more aggressive than SpaceX.
View OriginalReply0
StardustUnderTheGlassDome
· 23h ago
I'm curious whether the hallucination rate of V9 has improved; bigger models aren't always better.
View OriginalReply0
SudoSage
· 05-25 07:25
The phrase "leapfrog upgrade" coming from Elon Musk usually means there's really something behind it.
View OriginalReply0
YieldKaraoke
· 05-25 06:27
Still running Hopper externally, internally already using Blackwell, fully aware of the information gap.
View OriginalReply0
GateUser-53a6e1a8
· 05-25 06:14
Data cleaning has finally gained attention; previously, the quality of Grok's responses was indeed inconsistent.
View OriginalReply0
Don'tCallMeABagHolder.
· 05-25 06:08
Blackwell architecture optimization is the key point; improving computing power utilization directly determines profitability.
View OriginalReply0
LimeLeverageAlert
· 05-25 06:05
Waiting for a Cursor integration, the parameter scale in V9 is a bit outrageous
View OriginalReply0
BlackVelvetBluePeony
· 05-25 06:04
Blackwell optimization explanation: Old Huang and Musk are getting more and more tightly bound.
View OriginalReply0
Post-RainCancellationAgent
· 05-25 06:01
0.5T to 1.5T three times the parameters, the gap is indeed significant
View OriginalReply0
View More