logo

ChainThink

Stay ahead, master crypto insights

Meituan Open-Sources 560B-Parameter Theorem-Proving Model: 97.1% Success Rate in 72 Inference Attempts, Setting New SOTA for Open-Source Models

Meituan Open-Sources 560B-Parameter Theorem-Proving Model: 97.1% Success Rate in 72 Inference Attempts, Setting New SOTA for Open-Source Models

2026-03-21 10:22

View Original

According to 1M AI News monitoring, the Meituan LongCat team has open-sourced LongCat-Flash-Prover, a 560B-parameter MoE model specialized in mathematical reasoning tasks for the formal theorem proving language Lean4. The model weights are released under the MIT license and are now available on GitHub, Hugging Face, and ModelScope.

The model decomposes formal reasoning into three distinct capabilities: automatic formalization (transforming natural language math problems into Lean4 formal statements), sketch generation (producing lemma-style proof skeletons), and full proof generation. Each capability is integrated with Agent-based tools leveraging Temporal Iterative Reasoning (TIR) to interact in real-time with the Lean4 compiler for validation.


On training, the team introduces the Hybrid-Experts Iteration Framework to generate cold-start data, and during the reinforcement learning phase, incorporates the HisPO algorithm to stabilize long-horizon task training in the MoE model, while also integrating theorem consistency and validity checking mechanisms to prevent reward hacking.

Benchmark results show that LongCat-Flash-Prover achieves new SOTA performance in both automatic formalization and theorem proving among open-weight models. On MiniF2F-Test, it attains a 97.1% pass rate with only 72 inference steps; ProverBench and PutnamBench achieve 70.8% and 41.5% respectively, with no more than 220 inference steps per problem.

#emReinforcement Learning/em

Disclaimer: Contains third-party opinions, does not constitute financial advice

Recommended Reading
China's Open-Source Models Reemerge as Global AI Foundation: The Cursor "Shell-Skimming" Controversy Over Kimi Reverses Course—from Infringement Allegations to Licensing Partnership
China's Open-Source Models Reemerge as Global AI Foundation: The Cursor "Shell-Skimming" Controversy Over Kimi Reverses Course—from Infringement Allegations to Licensing Partnership
Cursor's "in-house" Composer 2 Embeds Kimi K2.5; Moonshot AI Accuses: Violation of License Terms and Unpaid Fees
Cursor's "in-house" Composer 2 Embeds Kimi K2.5; Moonshot AI Accuses: Violation of License Terms and Unpaid Fees
Model Iterating on Itself: MiniMax M2.7 Launches, Bringing Coding Capabilities Close to Opus Level
Model Iterating on Itself: MiniMax M2.7 Launches, Bringing Coding Capabilities Close to Opus Level
Following Baidu, Tencent has also become a sponsor of OpenClaw
Following Baidu, Tencent has also become a sponsor of OpenClaw
GitHub Data Reveals Massive Exodus of Crypto Developers Since 2022, With AI Emerging as the New Destination for Developer Migration
GitHub Data Reveals Massive Exodus of Crypto Developers Since 2022, With AI Emerging as the New Destination for Developer Migration
Karpathy Open Sources AutoResearch: Sleep One Night, Let AI Automatically Run 100 LLM Training Iterations
Karpathy Open Sources AutoResearch: Sleep One Night, Let AI Automatically Run 100 LLM Training Iterations
Alibaba-affiliated Research Team: AI Agent ROME Unauthorizedly Initiates Cryptocurrency Mining
Alibaba-affiliated Research Team: AI Agent ROME Unauthorizedly Initiates Cryptocurrency Mining