TheoremExplainBench

Name: TheoremExplainBench
Creator: maas
Published: 2025-11-27 16:24:07
License: 暂无描述

魔搭社区2025-11-27 更新2025-03-08 收录

下载链接：

https://modelscope.cn/datasets/TIGER-Lab/TheoremExplainBench

下载链接

链接失效反馈

官方服务：

资源简介：

# TheoremExplainBench  TheoremExplainBench is a dataset designed to evaluate and improve the ability of large language models (LLMs) to understand and explain mathematical and scientific theorems across multiple domains, through long-form multimodal content (e.g. Manim Videos). It consists of 240 theorems, categorized by difficulty and subject area to enable structured benchmarking. ## Dataset Details  - **Curated by:** Max Ku, Thomas Chong - **Language(s) (NLP):** English - **License:** MIT - **Repository:** https://github.com/TIGER-AI-Lab/TheoremExplainAgent - **Paper :** https://huggingface.co/papers/2502.19400 - **Arxiv Paper :** https://arxiv.org/abs/2502.19400 ## Uses  The dataset is intended to be used for evaluating the performance of LLMs in explaining mathematical and scientific theorems by generating long-form Manim Videos. Potential applications include: Model evaluation: Assessing LLMs' theorem comprehension and explanatory capabilities in other forms of multimodal contents (e.g. Text + N Animations) ## Dataset Structure  The dataset contains 240 theorems distributed across: Difficulty Levels: * Easy: 80 theorems * Medium: 80 theorems * Hard: 80 theorems Subject Areas (evenly split within each difficulty level): * Computer Science: 20 theorems * Mathematics: 20 theorems * Physics: 20 theorems * Chemistry: 20 theorems For each theorem we provided a "description", which does not necessarily fully illustrating the theorem. It is just for the context to help LLM distinguish the context uses. ## Dataset Creation Theorems were collected from:  * LibreTexts * OpenStax ## Citation  **BibTeX:** ```bibtex @misc{ku2025theoremexplainagentmultimodalexplanationsllm, title={TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding}, author={Max Ku and Thomas Chong and Jonathan Leung and Krish Shah and Alvin Yu and Wenhu Chen}, year={2025}, eprint={2502.19400}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2502.19400}, } ``` ## Dataset Card Contact Contact: Max Ku (@vinesmsuic)

# TheoremExplainBench  TheoremExplainBench是一款旨在评估并提升**大语言模型（LLM）**理解与解释多领域数学及科学定理能力的数据集，支持通过长格式多模态内容（如Manim动画视频）开展相关任务。该数据集包含240条定理，并按难度等级与学科领域进行分类，以支持结构化基准测试。 ## 数据集详情  - **整理方：** Max Ku、Thomas Chong - **语言（自然语言处理）：** 英语 - **授权协议：** MIT - **代码仓库：** https://github.com/TIGER-AI-Lab/TheoremExplainAgent - **论文链接：** https://huggingface.co/papers/2502.19400 - **ArXiv预印本：** https://arxiv.org/abs/2502.19400 ## 数据集用途  本数据集旨在通过生成长格式Manim视频的方式，评估大语言模型对数学与科学定理的解释能力。其潜在应用场景包括：模型评估：用于测评大语言模型在其他多模态内容形式（如文本+多段动画）下的定理理解与解释能力。 ## 数据集结构  本数据集共包含240条定理，按以下维度划分： ### 难度等级 * 简单：80条定理 * 中等：80条定理 * 困难：80条定理 ### 学科领域（每个难度等级下均分拆为对应数量） * 计算机科学：20条定理 * 数学：20条定理 * 物理学：20条定理 * 化学：20条定理针对每条定理，我们提供了一段“描述文本”，该文本未必完整阐释定理本身，仅作为上下文辅助大语言模型区分应用场景。 ## 数据集构建  本数据集的定理来源于以下平台： * LibreTexts * OpenStax ## 引用说明  **BibTeX 引用：** bibtex @misc{ku2025theoremexplainagentmultimodalexplanationsllm, title={TheoremExplainAgent: 面向大语言模型定理理解的多模态解释方法}, author={Max Ku, Thomas Chong, Jonathan Leung, Krish Shah, Alvin Yu, Wenhu Chen}, year={2025}, eprint={2502.19400}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2502.19400}, } ## 数据集卡片联系人联系方式：Max Ku（@vinesmsuic）

提供机构：

maas

创建时间：

2025-02-23

5,000+

优质数据集

54 个

任务类型

进入经典数据集