xAI owns 500k GPUs but utilization is only 11%

robot
Abstract generation in progress

CryptoWorld News: Elon Musk’s xAI owns approximately 500k NVIDIA GPUs, making it one of the largest clusters among AI developers based on publicly available data. However, an internal memo shows that xAI’s MFU (model flops utilization, measuring the ratio of actual chip computational output to theoretical peak) has been only about 11% in recent weeks. A researcher from a competing lab said that most companies find it difficult to break through 40%, but 11% is “absurdly low.” Low utilization is a common industry issue; AI training is intermittent: GPUs run at full capacity during training, but when researchers analyze results and decide on the next steps, the chips sit idle. There are also hardware bottlenecks: high-bandwidth memory (HBM) speeds can’t keep up with the compute chips, and data transfer between thousands of GPUs can be slowed down by any weak link in the network. The industry also has a phenomenon called “data padding,” where a researcher from a large lab revealed that colleagues repeatedly rerun training experiments to inflate utilization figures, partly to avoid criticism from superiors and partly to prevent idle GPUs from being reassigned to other teams.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments