logo

ChainThink

Stay ahead, master crypto insights

Human Full Completion, AI Peak at 0.37%: ARC-AGI-3 Tests Agent True Intelligence with "Unknown Games"

Human Full Completion, AI Peak at 0.37%: ARC-AGI-3 Tests Agent True Intelligence with "Unknown Games"

2026-03-26 18:58

View Original

ChainThink report, March 26, 2026: According to 1M AI News monitoring, the ARC Prize Foundation, a non-profit institution co-founded by François Chollet, creator of Keras, and Mike Knoop, co-founder of Zapier, has launched the ARC-AGI-3 benchmark.


Differing from the static grid reasoning tasks of its predecessors, ARC-AGI-3 is an interactive turn-based environment where Agents operate within a 64×64, 16-color grid world. Without predefined instructions or goal hints, Agents must autonomously explore the environment, infer underlying rules and victory conditions, construct a world model, and plan action sequences. Scoring is based on "action efficiency"—fewer steps required to complete a level yield higher scores—distinguishing genuine reasoning capability from brute-force enumeration. All environments have undergone human calibration testing, confirming 100% solvability by humans upon first exposure.


As of launch, top-performing AI models achieved the following scores: Google Gemini 3.1 Pro Preview 0.37%, OpenAI GPT 5.4 (High) 0.26%, Anthropic Opus 4.6 (Max) 0.25%, xAI Grok-4.20 (Beta) 0.00%.


The release of this new version stems partly from concerns over contamination of prior benchmarks. Previously, Gemini 3 automatically leveraged the integer-to-color mapping intrinsic to ARC-AGI within its reasoning chains, despite no mention of such mapping in prompts—indicating that model training data had fully encompassed ARC-AGI tasks. ARC-AGI-3 resists such memorization shortcuts through interactive environments and autonomous goal discovery mechanisms. The ARC Prize 2026 competition features total prize funds exceeding $2 million.

Disclaimer: Contains third-party opinions, does not constitute financial advice

Recommended Reading
Trump dodges ground-based uranium recovery plan, jokingly says "operation at 3 PM tomorrow"
Trump dodges ground-based uranium recovery plan, jokingly says "operation at 3 PM tomorrow"
U.S. Treasury Secretary Bessent: The U.S. dollar is appreciating, and capital is flowing in
U.S. Treasury Secretary Bessent: The U.S. dollar is appreciating, and capital is flowing in
Today, U.S. Bitcoin ETFs experienced net inflows of 86 BTC, while Ethereum ETFs saw net outflows of 4,439 ETH
Today, U.S. Bitcoin ETFs experienced net inflows of 86 BTC, while Ethereum ETFs saw net outflows of 4,439 ETH
U.S. Treasury Secretary Bessent: The U.S. dollar has re-established its safe-haven status
U.S. Treasury Secretary Bessent: The U.S. dollar has re-established its safe-haven status
UBS: Fed rate cuts may be delayed until September, followed by another cut in December
UBS: Fed rate cuts may be delayed until September, followed by another cut in December
X loses lawsuit alleging advertisers boycotted the platform
X loses lawsuit alleging advertisers boycotted the platform
James Wynn opens another short position on BTC with 40x leverage, liquidation price at $70,270.83
James Wynn opens another short position on BTC with 40x leverage, liquidation price at $70,270.83