TheoremExplainBench
收藏魔搭社区2025-11-27 更新2025-03-08 收录
下载链接:
https://modelscope.cn/datasets/TIGER-Lab/TheoremExplainBench
下载链接
链接失效反馈官方服务:
资源简介:
# TheoremExplainBench
<!-- Provide a quick summary of the dataset. -->
TheoremExplainBench is a dataset designed to evaluate and improve the ability of large language models (LLMs) to understand and explain mathematical and scientific theorems across multiple domains, through long-form multimodal content (e.g. Manim Videos). It consists of 240 theorems, categorized by difficulty and subject area to enable structured benchmarking.
## Dataset Details
<!-- Provide a longer summary of what this dataset is. -->
- **Curated by:** Max Ku, Thomas Chong
- **Language(s) (NLP):** English
- **License:** MIT
- **Repository:** https://github.com/TIGER-AI-Lab/TheoremExplainAgent
- **Paper :** https://huggingface.co/papers/2502.19400
- **Arxiv Paper :** https://arxiv.org/abs/2502.19400
## Uses
<!-- Address questions around how the dataset is intended to be used. -->
The dataset is intended to be used for evaluating the performance of LLMs in explaining mathematical and scientific theorems by generating long-form Manim Videos. Potential applications include:
Model evaluation: Assessing LLMs' theorem comprehension and explanatory capabilities in other forms of multimodal contents (e.g. Text + N Animations)
## Dataset Structure
<!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. -->
The dataset contains 240 theorems distributed across:
Difficulty Levels:
* Easy: 80 theorems
* Medium: 80 theorems
* Hard: 80 theorems
Subject Areas (evenly split within each difficulty level):
* Computer Science: 20 theorems
* Mathematics: 20 theorems
* Physics: 20 theorems
* Chemistry: 20 theorems
For each theorem we provided a "description", which does not necessarily fully illustrating the theorem. It is just for the context to help LLM distinguish the context uses.
## Dataset Creation
Theorems were collected from:
<!-- This section describes the source data (e.g. news text and headlines, social media posts, translated sentences, ...). -->
* LibreTexts
* OpenStax
## Citation
<!-- If there is a paper or blog post introducing the dataset, the APA and Bibtex information for that should go in this section. -->
**BibTeX:**
```bibtex
@misc{ku2025theoremexplainagentmultimodalexplanationsllm,
title={TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding},
author={Max Ku and Thomas Chong and Jonathan Leung and Krish Shah and Alvin Yu and Wenhu Chen},
year={2025},
eprint={2502.19400},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2502.19400},
}
```
## Dataset Card Contact
Contact: Max Ku (@vinesmsuic)
# TheoremExplainBench
<!-- 数据集快速摘要 -->
TheoremExplainBench是一款旨在评估并提升**大语言模型(LLM)**理解与解释多领域数学及科学定理能力的数据集,支持通过长格式多模态内容(如Manim动画视频)开展相关任务。该数据集包含240条定理,并按难度等级与学科领域进行分类,以支持结构化基准测试。
## 数据集详情
<!-- 对数据集的详细说明 -->
- **整理方:** Max Ku、Thomas Chong
- **语言(自然语言处理):** 英语
- **授权协议:** MIT
- **代码仓库:** https://github.com/TIGER-AI-Lab/TheoremExplainAgent
- **论文链接:** https://huggingface.co/papers/2502.19400
- **ArXiv预印本:** https://arxiv.org/abs/2502.19400
## 数据集用途
<!-- 说明数据集的预期使用场景 -->
本数据集旨在通过生成长格式Manim视频的方式,评估大语言模型对数学与科学定理的解释能力。其潜在应用场景包括:
模型评估:用于测评大语言模型在其他多模态内容形式(如文本+多段动画)下的定理理解与解释能力。
## 数据集结构
<!-- 本节说明数据集字段、划分标准、数据点关联等结构相关信息 -->
本数据集共包含240条定理,按以下维度划分:
### 难度等级
* 简单:80条定理
* 中等:80条定理
* 困难:80条定理
### 学科领域(每个难度等级下均分拆为对应数量)
* 计算机科学:20条定理
* 数学:20条定理
* 物理学:20条定理
* 化学:20条定理
针对每条定理,我们提供了一段“描述文本”,该文本未必完整阐释定理本身,仅作为上下文辅助大语言模型区分应用场景。
## 数据集构建
<!-- 本节说明数据源 -->
本数据集的定理来源于以下平台:
* LibreTexts
* OpenStax
## 引用说明
<!-- 若数据集对应学术论文或博文,需在此处提供其引用格式 -->
**BibTeX 引用:**
bibtex
@misc{ku2025theoremexplainagentmultimodalexplanationsllm,
title={TheoremExplainAgent: 面向大语言模型定理理解的多模态解释方法},
author={Max Ku, Thomas Chong, Jonathan Leung, Krish Shah, Alvin Yu, Wenhu Chen},
year={2025},
eprint={2502.19400},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2502.19400},
}
## 数据集卡片联系人
联系方式:Max Ku(@vinesmsuic)
提供机构:
maas
创建时间:
2025-02-23



