Poetiq's six-member team’s Meta-System achieved the highest score on LiveCodeBench Pro. This pure API plugin improves itself through recursive self-improvement to extract task experience, without touching weights or fine-tuning, significantly enhancing weak models. After integration, KimiK2.6 rose from 50.0% to 79.9%, Gemini3.0 Flash increased by 10 points, even surpassing Gemini3.1 Pro, Claude Opus4.7, GPT5.2 High. GPT5.5 High reached 93.9% through the plugin, Gemini3.1 Pro paired at 90.9%, surpassing Gemini3 Deep Think. Companies can improve reasoning ability without costly fine-tuning.

MeNews

2026-05-24 08:07:37

Abstract generation in progress

AIMPACT News, May 15 (UTC+8), according to Beating Monitoring, a six-person startup team Poetiq founded by former Google and DeepMind researchers Shumeet Baluja and Ian Fischer announced that their Meta-System has set a new record on the programming benchmark LiveCodeBench Pro. This system is a purely API-accessible intelligent harness that automatically extracts task experience through recursive self-improvement. Official tests show that, without touching model weights or fine-tuning, the system directly boosts the coding capabilities of mainstream large models on the market. Test results indicate that this decoupled external system significantly improves weaker models. After integrating Poetiq, Kimi K2.6's accuracy skyrocketed from 50.0% to 79.9%, an absolute score increase of 29.9 percentage points; lightweight Gemini 3.0 Flash improved by 10 points, surpassing its larger version Gemini 3.1 Pro and even defeating the "bigger and more expensive" Claude Opus 4.7 and GPT 5.2 High as claimed by Poetiq. In terms of pushing performance limits, GPT 5.5 High, which originally scored 89.6%, reached a new height of 93.9% with the external system; meanwhile, the base Gemini 3.1 Pro paired with this external system scored 90.9%, directly surpassing Google's most powerful reasoning model Gemini 3 Deep Think (88.8%) which has not yet opened its API. The Poetiq team stated that traditional fine-tuning locks improvement effects onto a single model, whereas their seamless plug-and-play external system allows enterprises to avoid the high costs of fine-tuning and deploying full-capacity models for reasoning capabilities. (Source: BlockBeats)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

8 Likes

Reward
8
8
Repost
Share

Comment

Add a comment

SeaSaltAirdropNotes

· 8h ago

Pure API plugins can fully leverage weak models; this approach is too wild, saving companies a lot of money.

View OriginalReply0

StargazerInTheWoods

· 9h ago

This Meta-System is like giving the model an external brain, and experience reuse is done really well.

View OriginalReply0

RugpullTaster

· 9h ago

Achieving parity with Deep Think without fine-tuning, small and medium-sized companies are ecstatic

View OriginalReply0

TreatEarningsAsSnacks

· 9h ago

A six-person team outperforms a bunch of major company's fine-tuning departments, full of sarcasm.

View OriginalReply0

MarginMoth

· 9h ago

Claude Opus4.7被Flash+外挂爆掉，Anthropic该反思了

Reply0

BridgeSideBanter

· 9h ago

Recursively extracting experience sounds like the model is writing its own prompt engineering.

View OriginalReply0

GovernanceGremlin

· 9h ago

Weak models' plugins become stronger, is the democratization of computing power truly here?

View OriginalReply0

FloatingTeacupClub

· 9h ago

GPT5.5 High is already at 93.9%, and the ceiling is still moving upward.

View OriginalReply0

Trending Topics
View More
#
TradfiTradingChallenge
348.71K Popularity
#
PlatinumCardCreatorExclusive
128.37K Popularity
#
DailyPolymarketHotspot
1.05M Popularity
#
GateSquarePizzaDay
668.04K Popularity
#
SpaceXOfficiallyFilesforIPO
574.49K Popularity

Pinned

Sitemap

No weight adjustment, pure API tuning: Poetiq "plugin" boosts Kimi by 29.9 percentage points, lightweight Gemini counterattacks Claude Opus

Trending Topics

TradfiTradingChallenge

PlatinumCardCreatorExclusive

DailyPolymarketHotspot

GateSquarePizzaDay

SpaceXOfficiallyFilesforIPO

Pinned