Futures
Access hundreds of perpetual contracts
CFD
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Pre-IPOs
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Promotions
AI
Gate AI
Your all-in-one conversational AI partner
Gate AI Bot
Use Gate AI directly in your social App
GateClaw
Gate Blue Lobster, ready to go
Gate for AI Agent
AI infrastructure, Gate MCP, Skills, and CLI
Gate Skills Hub
10K+ Skills
From office tasks to trading, the all-in-one skill hub makes AI even more useful.
GateRouter
Smartly choose from 40+ AI models, with 0% extra fees
llama.cpp officially supports WebGPU, with inference memory usage on the browser side dropping by over 30%
ME AI According to Beating Monitoring, the official WebGPU backend for llama.cpp and ggml has been officially released, enabling GGUF-format large models to run directly in the browser with local GPU acceleration. The new backend breaks free from reliance on specific native clients or complex WebAssembly architectures, achieving privacy-preserving, on-device inference with no data leaving the device—opening up a zero-configuration entry point to local compute power for the web ecosystem.
A related paper published on May 20 states that the WebGPU backend introduces static memory planning and an efficient model-loading mechanism, reducing runtime GPU memory overhead on the web by 29% to 33% compared with existing frameworks. On mainstream GPU devices such as Intel, Apple, and NVIDIA, average decoding throughput improves by 45% to 69%.
The web demo is based on the open-source library wllama. Recent underlying optimizations completed at the foundation level have achieved better GPU memory control than described in the paper. llama.cpp can also be natively compiled via Google’s C++ WebGPU implementation Dawn, providing a benchmarking baseline for performance comparisons between Vulkan and WebGPU.
(Source: BlockBeats)