Meituan Open-Sources 560B-Parameter Theorem-Proving Model: 97.1% Success Rate in 72 Inference Attempts, Setting New SOTA for Open-Source Models

2026-03-21 10:22

According to 1M AI News monitoring, the Meituan LongCat team has open-sourced LongCat-Flash-Prover, a 560B-parameter MoE model specialized in mathematical reasoning tasks for the formal theorem proving language Lean4. The model weights are released under the MIT license and are now available on GitHub, Hugging Face, and ModelScope.

The model decomposes formal reasoning into three distinct capabilities: automatic formalization (transforming natural language math problems into Lean4 formal statements), sketch generation (producing lemma-style proof skeletons), and full proof generation. Each capability is integrated with Agent-based tools leveraging Temporal Iterative Reasoning (TIR) to interact in real-time with the Lean4 compiler for validation.

On training, the team introduces the Hybrid-Experts Iteration Framework to generate cold-start data, and during the reinforcement learning phase, incorporates the HisPO algorithm to stabilize long-horizon task training in the MoE model, while also integrating theorem consistency and validity checking mechanisms to prevent reward hacking.

Benchmark results show that LongCat-Flash-Prover achieves new SOTA performance in both automatic formalization and theorem proving among open-weight models. On MiniF2F-Test, it attains a 97.1% pass rate with only 72 inference steps; ProverBench and PutnamBench achieve 70.8% and 41.5% respectively, with no more than 220 inference steps per problem.

#emReinforcement Learning/em

Disclaimer: Contains third-party opinions, does not constitute financial advice

Recommended Reading

China's Open-Source Models Reemerge as Global AI Foundation: The Cursor "Shell-Skimming" Controversy Over Kimi Reverses Course—from Infringement Allegations to Licensing Partnership

Cursor's "in-house" Composer 2 Embeds Kimi K2.5; Moonshot AI Accuses: Violation of License Terms and Unpaid Fees

Model Iterating on Itself: MiniMax M2.7 Launches, Bringing Coding Capabilities Close to Opus Level

Following Baidu, Tencent has also become a sponsor of OpenClaw

GitHub Data Reveals Massive Exodus of Crypto Developers Since 2022, With AI Emerging as the New Destination for Developer Migration

Karpathy Open Sources AutoResearch: Sleep One Night, Let AI Automatically Run 100 LLM Training Iterations

Alibaba-affiliated Research Team: AI Agent ROME Unauthorizedly Initiates Cryptocurrency Mining

China's Open-Source Models Reemerge as Global AI Foundation: The Cursor "Shell-Skimming" Controversy Over Kimi Reverses Course—from Infringement Allegations to Licensing Partnership

Cursor's "in-house" Composer 2 Embeds Kimi K2.5; Moonshot AI Accuses: Violation of License Terms and Unpaid Fees

Model Iterating on Itself: MiniMax M2.7 Launches, Bringing Coding Capabilities Close to Opus Level

Following Baidu, Tencent has also become a sponsor of OpenClaw

GitHub Data Reveals Massive Exodus of Crypto Developers Since 2022, With AI Emerging as the New Destination for Developer Migration

Karpathy Open Sources AutoResearch: Sleep One Night, Let AI Automatically Run 100 LLM Training Iterations

Alibaba-affiliated Research Team: AI Agent ROME Unauthorizedly Initiates Cryptocurrency Mining

ChainThink

Meituan Open-Sources 560B-Parameter Theorem-Proving Model: 97.1% Success Rate in 72 Inference Attempts, Setting New SOTA for Open-Source Models

OpenClaw Practical Guide

Crypto Weekly

Frontier Insights

Blowup Alert

Regulatory Watch

Popular Airdrop Tutorial