Zhipu GLM-5V-Turbo Technical Report: Design2Code super Claude Opus4.6, directly generate code from the screenshot

robot
Abstract generation in progress

CryptoWorld News reports that Zhipu AI has released the GLM-5V-Turbo technical report, and the model was launched on z.ai API and OpenRouter in early April. This report supplements the methodology, and the model has not been open-sourced. GLM-5V-Turbo is Zhipu’s first multimodal programming foundation model, supporting around 200k context length, and can connect to agent frameworks such as Claude Code and OpenClaw. From the pre-training stage, the model integrates visual perception into the entire process of reasoning, planning, tool invocation, and execution. The model architecture has three key design elements: a new visual encoder CogVit, using SigLip2 and DinoV3 for dual-teacher distillation pre-training, and contrastive learning alignment of multimodal multi-token prediction (MMTP) with 8 billion Chinese-English bilingual image-text data, replacing direct visual embedding transmission with a shared learnable special token to reduce communication complexity across pipeline stages, resulting in more stable joint reinforcement learning covering perception, reasoning, and agent execution at three levels. Specific benchmark results show that Design2Code achieves 94.8, surpassing Claude Opus 4.6.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin