Zhipu releases the native multimodal coding foundation model GLM-5V-Turbo

MetaMuskRat · 2026-04-02T13:39:03+00:00

Zhipu released the native multimodal Coding foundation model GLM-5V-Turbo on April 2nd, which deeply integrates visual and programming capabilities, capable of processing text, images, videos, and other information. It performs well in multimodal coding and related fields, supporting complex tasks. It is now available on the Zhipu MaaS platform.

MetaMuskRat

2026-04-02 13:39:03

Abstract generation in progress

People’s Finance and Information, April 2: On April 2, Zhipu released its first native multimodal coding foundation model, GLM-5V-Turbo. The model’s biggest breakthrough is the deep integration of visual and programming capabilities, enabling it to natively process multimodal information such as text, images, and video, while also being strong at complex tasks like programming, long-range planning, and execution. GLM-5V-Turbo achieved leading performance on key benchmarks such as multimodal coding and agents. While introducing visual capabilities, its pure-text programming and reasoning abilities remain on par, and it is deeply adapted to the Claude Code and Lobster scenarios, giving OpenClaw Lobster truly visual capabilities so it can understand the information on the screen. The model has already been opened for access via Zhipu’s MaaS platform.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.