ARC Prize Foundation announced the ARC-AGI-3 human performance dataset, which includes test results from 458 participants across 135 abstract reasoning environments without instructions. All environments were completed by humans, and AGI has not yet been achieved. Meanwhile, the foundation adjusted the scoring rules to slightly increase scores for both humans and AI.

MeNews

2026-04-15 07:00:23

Abstract generation in progress

ME News, April 15 (UTC+8). According to Beating Monitoring, the ARC Prize Foundation has released the human performance dataset for ARC-AGI-3. This is the largest human testing research in the ARC-AGI series to date, with 458 participants. The dataset includes 342 complete human action replay records, covering 25 public environments, and has all been open-sourced. ARC-AGI-3 includes 135 abstract reasoning environments. Testers receive no gameplay instructions; they must explore, infer the rules, and formulate strategies on their own. The tests are conducted at an offline testing center in San Francisco, with each session lasting 90 minutes. Participants earn about $130 in base salary plus a $5 reward for each environment they complete. All tests are under a “first-attempt completion” condition, meaning each person only sees and attempts each environment once, measuring learning and adaptation ability when facing entirely new problems. Humans and AI are given exactly the same information, with no information gap.

Core conclusion: All environments in ARC-AGI-3 have been completed by humans. In each environment, at least two independent participants completed it, and in most environments, more than five people completed it. The ARC Prize Foundation states, “We have not yet achieved AGI, and this dataset is evidence.”

Since the preview of ARC-AGI-3, nearly 1 million AI evaluation submissions have been received for the open environments. Based on these data, the foundation has also announced two adjustments to its scoring rules: first, changing the human reference score for each level from the “second-best player” to the “median player” to reduce the impact of luck on scores; second, increasing the per-level score cap from 100% to 115% to prevent a single poor performance from dragging down overall results. The net effect of both adjustments is a slight increase in both human and AI scores by about 0.5 percentage points. (Source: BlockBeats)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

Add a comment

No comments

Trending Topics
View More
#
GatePreIPOsLaunchesWithSpaceX
124.04K Popularity
#
GateMarchTransparencyReport
45.39K Popularity
#
GoldmanSachsFilesBitcoinIncomeETF
777.3K Popularity
#
USBlocksStraitofHormuz
751.97K Popularity
#
WCTCTradingChallengeShare8MUSDT
616.14K Popularity

Sitemap

ARC-AGI-3 announces the largest human test in history: all levels have been conquered by humans, AI still has gaps

Trending Topics

GatePreIPOsLaunchesWithSpaceX

GateMarchTransparencyReport

GoldmanSachsFilesBitcoinIncomeETF

USBlocksStraitofHormuz

WCTCTradingChallengeShare8MUSDT

Pin