Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Launchpad
Be early to the next big token project
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Ollama Rebuilds Apple Silicon Inference Engine with MLX: Decode Speed Nearly Doubles, Compatible with Claude Code
According to monitoring by 1M AI News, Ollama has released the preview version 0.19, which rebuilds the inference engine on Apple Silicon using Apple’s machine learning framework MLX. This enhancement leverages a unified memory architecture to boost performance, utilizing the GPU neural network accelerator on M5/M5 Pro/M5 Max chips while optimizing the first token latency and generation speed. Benchmark tests conducted on March 29 showed that when running the Qwen3.5-35B-A3B model (NVIDIA NVFP4 quantization) on M5 series chips, the prefill speed increased from 1154 tokens/s to 1810 tokens/s, and the decode speed improved from 58 tokens/s to 112 tokens/s, nearly doubling. When switching to int4 precision, the prefill can further reach 1851 tokens/s, with decode hitting 134 tokens/s. The 0.19 version also adds support for NVIDIA NVFP4 quantization format. NVFP4 is a quantization method that maintains model accuracy while reducing memory bandwidth and storage usage, compatible with models optimized by NVIDIA Model Optimizer and consistent with the formats used by major cloud inference service providers. The caching system has been upgraded to support cross-session reuse (using tools like Claude Code, shared system prompts can achieve higher cache hits), store snapshots at key positions in prompts to reduce redundant processing, and implement smarter cache eviction strategies. This preview version requires a Mac with more than 32GB of unified memory, and the model optimized specifically for programming tasks is Qwen3.5-35B-A3B, which can be accessed via ‘ollama launch claude --model qwen3.5:35b-a3b-coding-nvfp4’ for Claude Code.