Poetiq's six-member team’s Meta-System set a new high score on LiveCodeBench Pro. This pure API plugin improves itself through recursive self-enhancement to extract task experience, without touching weights or fine-tuning, significantly boosting weak models. After integration, KimiK2.6 increased from 50.0% to 79.9%, Gemini3.0 Flash gained 10 points, surpassing Gemini3.1 Pro, Claude Opus4.7, and GPT5.2 High. GPT5.5 High reached 93.9% through the plugin, Gemini3.1 Pro paired at 90.9%, surpassing Gemini3 Deep Think. Enterprises can enhance reasoning capabilities without costly fine-tuning.

MeNews

2026-05-24 01:02:07

Abstract generation in progress

AIMPACT News, May 15 (UTC+8), according to Beating Monitoring, a six-person startup team Poetiq, founded by former Google and DeepMind researcher Shumeet Baluja and Ian Fischer, announced that their Meta-System has set a new record on the programming benchmark LiveCodeBench Pro. This system is a purely API-accessible intelligent plugin (Harness) that recursively self-improves to automatically extract task experience. Official tests show that, without touching model weights or fine-tuning, the system directly boosts the coding capabilities of mainstream large models on the market.
Test results indicate that this decoupled plugin approach significantly improves weaker models. After integrating Poetiq system, Kimi K2.6's accuracy skyrocketed from 50.0% to 79.9%, an absolute score increase of 29.9 percentage points; lightweight Gemini 3.0 Flash improved by 10 points, not only surpassing its larger version Gemini 3.1 Pro but also defeating the "bigger and more expensive" Claude Opus 4.7 and GPT 5.2 High, as claimed by Poetiq.
In terms of pushing performance limits, GPT 5.5 High, originally scoring 89.6%, reached a new height of 93.9% with the plugin; while the basic Gemini 3.1 Pro, paired with this plugin, scored 90.9%, directly surpassing Google's most powerful reasoning model Gemini 3 Deep Think (88.8%) which has not yet opened its API.
Poetiq team stated that traditional fine-tuning locks the improvement effects onto a single model, whereas their seamless plug-and-play system allows enterprises to avoid the high costs of fine-tuning and deploying full-capacity models solely for reasoning capabilities.
(Source: BlockBeats)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

5 Likes

Reward
5
7
2
Share

Comment

Add a comment

CandleChaser

· 3h ago

Is the idea of a weak model + strong external plugins implying that the spring of small models is coming?

View OriginalReply0

GateUser-b74aba1c

· 9h ago

A six-person team breaks through the large model inference bottleneck; API layer innovation has been underestimated for too long.

View OriginalReply0

L2ArbitrageTrader

· 10h ago

Poetiq's six members achieved this result; the team composition is worth studying.

View OriginalReply0

CyberBridgeDeepPerspective

· 10h ago

GPT5.5 Achieved 93.9% performance, Gemini3.1 Pro scored 90.9% after configuration, this mod surpasses the stock top-tier version.

View OriginalReply0

GateUser-cbb8cdf5

· 10h ago

The company is ecstatic, finally no longer needing to spend money on fine-tuning.

View OriginalReply0

BribeCoffee

· 10h ago

Recursive self-improvement + experience extraction, the Meta-System architecture is quite interesting

View OriginalReply0

VineGeometry

· 10h ago

Pure API cheats that don't touch weights can boost Kimi from 50% to 79%, and this approach is much smarter than fine-tuning.

View OriginalReply0

Trending Topics
View More
#
TradfiTradingChallenge
335.49K Popularity
#
PlatinumCardCreatorExclusive
122.04K Popularity
#
DailyPolymarketHotspot
1.05M Popularity
#
GateSquarePizzaDay
659.21K Popularity
#
SpaceXOfficiallyFilesforIPO
571.87K Popularity

Pinned

Sitemap

No weight adjustment, pure API tuning: Poetiq "plugin" boosts Kimi by 29.9 percentage points, lightweight Gemini counterattacks Claude Opus

Trending Topics

TradfiTradingChallenge

PlatinumCardCreatorExclusive

DailyPolymarketHotspot

GateSquarePizzaDay

SpaceXOfficiallyFilesforIPO

Pinned