m-a-p/CriticLeanBench

Name: m-a-p/CriticLeanBench
Creator: m-a-p
Published: 2025-07-09 01:58:57
License: 暂无描述

Hugging Face2025-07-09 更新2025-08-09 收录

下载链接：

https://hf-mirror.com/datasets/m-a-p/CriticLeanBench

下载链接

链接失效反馈

官方服务：

资源简介：

CriticLeanBench是一个全面的基准测试，旨在评估模型在将自然语言数学陈述翻译成Lean 4中形式化验证的定理声明方面的批判性推理能力。它包含500对经过人工验证的问题，涵盖了不同的数学领域和难度级别，并为错误样本提供了详细的错误注释。

CriticLeanBench is a comprehensive benchmark designed to evaluate the critical reasoning capabilities of models in translating natural language mathematical statements into formally verified theorem declarations in Lean 4. It includes 500 human-verified problem pairs covering various mathematical domains and difficulty levels, with detailed error annotations for incorrect samples.

提供机构：

m-a-p

5,000+

优质数据集

54 个

任务类型

进入经典数据集