Futures
Access hundreds of perpetual contracts
CFD
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
CFD
U.S. stock CFD derivatives
US Stocks
Access real US stocks and ETFs
HK Stocks
Trade quality Hong Kong-listed stocks
Korean Stocks
SK Hynix
Real Korean stocks and top assets
Stock Futures
High leverage, 24/7 trading
Tokenized Stocks
Backed by real stock assets
IPO Access
Unlock full access to global stock IPOs
GUSD
Mint GUSD for Treasury RWA yields
Stocks Activities
Trade Popular Stocks and Unlock Generous Airdrops
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
IPO Access
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Promotions
AI
Gate AI
Your all-in-one conversational AI partner
Gate AI Bot
Use Gate AI directly in your social App
GateClaw
Gate Blue Lobster, ready to go
Gate for AI Agent
AI infrastructure, Gate MCP, Skills, and CLI
Gate Skills Hub
10K+ Skills
From office tasks to trading, the all-in-one skill hub makes AI even more useful.
Google Vision Banana: The "GPT-3 moment" of computer vision? Image generation model defeats specialized visual understanding model.
ME News, April 23 (UTC+8), according to Beating monitoring, a Google team (including authors such as Kaiming He and Saining Xie) published a paper proposing Vision Banana. They performed lightweight instruction fine-tuning on their own image generation model Nano Banana Pro (i.e., Gemini 3 Pro Image) to convert it into a general-purpose visual understanding model. The core approach is to unify the outputs of all visual tasks into RGB images, allowing perception tasks such as segmentation, depth estimation, and surface normal estimation to be completed through image generation without requiring dedicated architectures or training losses for each type of task.
The evaluation covers two major categories: image segmentation and 3D geometric inference. In segmentation, semantic segmentation (labeling each pixel in the image with a category, e.g., "road," "pedestrian," "vehicle") outperforms the dedicated segmentation model SAM 3 by 4.7 percentage points on Cityscapes; referring expression segmentation (finding and segmenting the corresponding object based on natural language descriptions, e.g., "the dog wearing a hat on the left") also surpasses SAM 3 Agent. However, it still lags behind SAM 3 in instance segmentation (distinguishing different individuals of the same category, e.g., separately labeling five dogs in the image). In 3D, metric depth estimation (inferring the actual physical distance from each pixel to the camera from a single photo) achieves an average accuracy of 0.929 across four standard datasets, higher than the dedicated model Depth Anything V3's 0.918, and is trained entirely with synthetic data without using real depth data, requiring no camera parameters during inference. Surface normal estimation (inferring the orientation of object surfaces) achieves state-of-the-art results on three indoor benchmarks.
The fine-tuning merely mixes a small amount of visual task data into the original image generation training data, and the model's image generation capability remains largely unaffected: it ties with the original Nano Banana Pro in generation quality evaluations. The paper argues that image generation pre-training plays a role in the visual domain similar to that of text generation pre-training in the language domain: in the process of learning to generate images, the model has already acquired the internal representations needed to understand images, and instruction fine-tuning simply unleashes them.
(Source: BlockBeats)