No weight adjustment, pure API tuning: Poetiq "plugin" boosts Kimi by 29.9 percentage points, lightweight Gemini counterattacks Claude Opus

robot
Abstract generation in progress
AIMPACT News, May 15 (UTC+8), according to Beating Monitoring, a six-person startup team Poetiq founded by former Google and DeepMind researchers Shumeet Baluja and Ian Fischer announced that their Meta-System has set a new record on the programming benchmark LiveCodeBench Pro. This system is a purely API-accessible intelligent harness that automatically extracts task experience through recursive self-improvement. Official tests show that, without touching model weights or fine-tuning, the system directly boosts the coding capabilities of mainstream large models on the market. Test results indicate that this decoupled external system significantly improves weaker models. After integrating Poetiq, Kimi K2.6's accuracy skyrocketed from 50.0% to 79.9%, an absolute score increase of 29.9 percentage points; lightweight Gemini 3.0 Flash improved by 10 points, surpassing its larger version Gemini 3.1 Pro and even defeating the "bigger and more expensive" Claude Opus 4.7 and GPT 5.2 High as claimed by Poetiq. In terms of pushing performance limits, GPT 5.5 High, which originally scored 89.6%, reached a new height of 93.9% with the external system; meanwhile, the base Gemini 3.1 Pro paired with this external system scored 90.9%, directly surpassing Google's most powerful reasoning model Gemini 3 Deep Think (88.8%) which has not yet opened its API. The Poetiq team stated that traditional fine-tuning locks improvement effects onto a single model, whereas their seamless plug-and-play external system allows enterprises to avoid the high costs of fine-tuning and deploying full-capacity models for reasoning capabilities. (Source: BlockBeats)
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 8
  • Repost
  • Share
Comment
Add a comment
Add a comment
SeaSaltAirdropNotes
· 8h ago
Pure API plugins can fully leverage weak models; this approach is too wild, saving companies a lot of money.
View OriginalReply0
StargazerInTheWoods
· 9h ago
This Meta-System is like giving the model an external brain, and experience reuse is done really well.
View OriginalReply0
RugpullTaster
· 9h ago
Achieving parity with Deep Think without fine-tuning, small and medium-sized companies are ecstatic
View OriginalReply0
TreatEarningsAsSnacks
· 9h ago
A six-person team outperforms a bunch of major company's fine-tuning departments, full of sarcasm.
View OriginalReply0
MarginMoth
· 9h ago
Claude Opus4.7被Flash+外挂爆掉,Anthropic该反思了
Reply0
BridgeSideBanter
· 9h ago
Recursively extracting experience sounds like the model is writing its own prompt engineering.
View OriginalReply0
GovernanceGremlin
· 9h ago
Weak models' plugins become stronger, is the democratization of computing power truly here?
View OriginalReply0
FloatingTeacupClub
· 9h ago
GPT5.5 High is already at 93.9%, and the ceiling is still moving upward.
View OriginalReply0
  • Pinned