Opus 4.7's low thinking degree surpasses Sonnet 4.6's maximum, Anthropic releases the first agent control tuning guide

robot
Abstract generation in progress

AIMPACT News, May 20 (UTC+8). According to Beating monitoring, Anthropic has released its first official developer guide, providing an in-depth disclosure of Claude 4.6 and Opus 4.7’s resolution upper limits, thinking-depth configuration/weighting, and cache cost-reduction mechanisms in computer and browser control scenarios.

Screen resolution directly determines the precision of an agent’s clicks. The maximum long-side resolution for screenshots parsed by Claude 4.6 is 1568 pixels, while Opus 4.7 is 2576 pixels. Once a screenshot exceeds the limit, the API server automatically scales the image down proportionally, which can cause the click coordinates generated by the model to become misaligned and drift relative to the original image on the client. Therefore, developers must scale screenshots to 1280x720 (recommended for Claude 4.6) or 1080p (recommended for Opus 4.7) on the client side in advance.

Interface control mainly relies on visual perception and element locating, with relatively low requirements for long-chain logical reasoning. Tests show that Opus 4.7’s control performance at a low thinking depth (low) can already match Sonnet 4.6’s maximum thinking depth (max), and its token cost is only one-tenth of the latter.

The official recommendation is to set the thinking option to high. Compared with max depth, this not only halves token consumption, but also keeps the success rate completely the same. To avoid the model overthinking and doubling the bill, max should be avoided.

Because a single screenshot consumes up to 1800 tokens in the context, the official provides three-tier cost-reduction solutions: maintaining 1 system-level cache breakpoint that stays resident, and dynamically allocating the other 3 breakpoints to the execution results of the most recent tool runs; performing scroll pruning on the client side, keeping only the most recent 3 screenshots in the context and replacing the rest with placeholders; and triggering summary compression when the context depth approaches 90%.

In addition, the API introduces a batch tool, computer_batch, which supports packing and executing multiple operations without any visual dependencies in a single call. It also provides an agent advisory mechanism (Advisor Tool), allowing the main model to directly summon a higher-level Opus model in the background to audit execution steps. Developers can further significantly improve task success rates through Teach Mode (Teach Mode), which records the user’s real operation trajectory and uses it as an instruction reference during replay.

(Source: BlockBeats)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned