Opus 4.7 low thinking-depth surpasses Sonnet 4.6’s maximum value; Anthropic releases its first agent control and operation tuning guide

robot
Abstract generation in progress

AIMPACT News, May 20 (UTC+8), according to Beating monitoring, Anthropic released its first official development guide, deeply disclosing the resolution limits, thinking depth allocation, and cache cost reduction mechanisms of Claude 4.6 and Opus 4.7 in computer and browser control scenarios.

Screen resolution directly determines the precision of an agent's clicks. The long-edge limit for Claude 4.6's screenshot parsing is 1568 pixels, while for Opus 4.7 it is 2576 pixels. Once a screenshot exceeds the limit, the API server automatically scales down the image proportionally, causing misalignment and drift between the model's click coordinates and the client's original image. Therefore, developers must pre-scale screenshots on the client side to 1280x720 (recommended for Claude 4.6) or 1080p (recommended for Opus 4.7).

Interface control mainly relies on visual perception and element positioning, with low demand for long-chain logical reasoning. Tests show that Opus 4.7 at low thinking depth (low) achieves control performance comparable to Sonnet 4.6 at maximum thinking depth (max), with token costs only one-tenth of the latter. The official recommendation is to set the thinking option to high, which halves token consumption compared to max depth while maintaining the same success rate. Avoid enabling max to prevent the model from overthinking and doubling costs.

Since a single screenshot consumes up to 1800 tokens in the context, the official provides a three-tier cost reduction plan: maintain one system-level cache breakpoint and dynamically assign the other three breakpoints to the execution results of recent rounds of tools; perform rolling pruning on the client side, retaining only the latest three screenshots in the context and replacing the rest with placeholders; trigger summary compression when context depth approaches 90%.

In addition, the API introduces a batch tool, computer_batch, which supports packaging multiple operations without visual dependencies into a single call; and provides an advisor mechanism (Advisor Tool), allowing the main model to directly summon a higher-level Opus model in the background to audit execution steps. Developers can also significantly improve task success rates by using the Teach Mode (recording real user operation trajectories and using them as instruction references during playback). (Source: BlockBeats)

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pinned