Beating Monitoring Disclosure: Cursor proposes autoinstall for Composer, automatically building a runnable RL environment using the previous generation model. The process has two steps: first, let the agent provide 10 verification commands and outputs; then, another agent selects 3 commands to set up the environment from scratch, with up to 5 retries, abandoning if all fail. During setup, dependencies are supplemented, such as fake databases, MinIO, Docker sidecars, and placeholder images. Taking celo-monorepo as an example, after the initial failure, a second round of self-built mock users bypassed authentication, ultimately running successfully. Composer 2 Terminal-Bench achieved 61.7%, higher than 47.9% of version 1.5. In the future, older versions of Composer will participate in more training stages.

MarsBitNews

2026-05-07 11:18:00

Abstract generation in progress

According to Beating monitoring, Cursor revealed a training trick for the Composer series models: using the previous generation model to automatically build a runnable environment for reinforcement learning (RL) of the next generation. When training Composer 2, Cursor used Composer 1.5 to complete this task, calling it autoinstall. RL training requires a runnable code environment. If the environment is not set up properly, the model wastes tokens on bug fixing and cannot learn effectively; in extreme cases, if the environment fails completely, the entire training compute is wasted. autoinstall solves this problem in two steps: first, an agent reads the codebase documentation and configuration, and proposes 10 verification commands with expected outputs; second, another agent takes 3 of these commands and sets up the environment from scratch until the commands run successfully. The second step retries up to 5 times; if all fail, the environment is discarded. During environment setup, the agent actively fills missing dependencies: fabricating database tables, creating MinIO configurations as a substitute for S3, launching Docker containers as sidecar services, and even generating placeholder images. The blog post uses the blockchain project celo-org/celo-monorepo as an example to demonstrate the entire process, where after the first environment setup failure, the second round creates mock users to bypass authentication, ultimately passing the tests. Composer 2 scored 61.7% on Terminal-Bench (a benchmark for testing the model building and development environment capability), nearly 14 percentage points higher than Composer 1.5’s 47.9%. Cursor states that future plans include involving the older Composer in more training stages, including data preprocessing, runtime management, and architecture tuning.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
GateSquareMayTradingShare
591.11K Popularity
#
BTCPullback
106.32M Popularity
#
CLARITYActStalled
3.29M Popularity
#
CryptoStocksRally
1.42M Popularity
#
DailyPolymarketHotspot
839.47K Popularity

Sitemap

Cursor discloses the "bootstrap" training method: using the old Composer to set up the environment for the new model, Terminal-Bench increases by 14 points

Trending Topics

GateSquareMayTradingShare

BTCPullback

CLARITYActStalled

CryptoStocksRally

DailyPolymarketHotspot

Pin