five

TheoremExplainBench

收藏
魔搭社区2025-11-27 更新2025-03-08 收录
下载链接:
https://modelscope.cn/datasets/TIGER-Lab/TheoremExplainBench
下载链接
链接失效反馈
官方服务:
资源简介:
# TheoremExplainBench <!-- Provide a quick summary of the dataset. --> TheoremExplainBench is a dataset designed to evaluate and improve the ability of large language models (LLMs) to understand and explain mathematical and scientific theorems across multiple domains, through long-form multimodal content (e.g. Manim Videos). It consists of 240 theorems, categorized by difficulty and subject area to enable structured benchmarking. ## Dataset Details <!-- Provide a longer summary of what this dataset is. --> - **Curated by:** Max Ku, Thomas Chong - **Language(s) (NLP):** English - **License:** MIT - **Repository:** https://github.com/TIGER-AI-Lab/TheoremExplainAgent - **Paper :** https://huggingface.co/papers/2502.19400 - **Arxiv Paper :** https://arxiv.org/abs/2502.19400 ## Uses <!-- Address questions around how the dataset is intended to be used. --> The dataset is intended to be used for evaluating the performance of LLMs in explaining mathematical and scientific theorems by generating long-form Manim Videos. Potential applications include: Model evaluation: Assessing LLMs' theorem comprehension and explanatory capabilities in other forms of multimodal contents (e.g. Text + N Animations) ## Dataset Structure <!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. --> The dataset contains 240 theorems distributed across: Difficulty Levels: * Easy: 80 theorems * Medium: 80 theorems * Hard: 80 theorems Subject Areas (evenly split within each difficulty level): * Computer Science: 20 theorems * Mathematics: 20 theorems * Physics: 20 theorems * Chemistry: 20 theorems For each theorem we provided a "description", which does not necessarily fully illustrating the theorem. It is just for the context to help LLM distinguish the context uses. ## Dataset Creation Theorems were collected from: <!-- This section describes the source data (e.g. news text and headlines, social media posts, translated sentences, ...). --> * LibreTexts * OpenStax ## Citation <!-- If there is a paper or blog post introducing the dataset, the APA and Bibtex information for that should go in this section. --> **BibTeX:** ```bibtex @misc{ku2025theoremexplainagentmultimodalexplanationsllm, title={TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding}, author={Max Ku and Thomas Chong and Jonathan Leung and Krish Shah and Alvin Yu and Wenhu Chen}, year={2025}, eprint={2502.19400}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2502.19400}, } ``` ## Dataset Card Contact Contact: Max Ku (@vinesmsuic)

# TheoremExplainBench <!-- 数据集快速摘要 --> TheoremExplainBench是一款旨在评估并提升**大语言模型(LLM)**理解与解释多领域数学及科学定理能力的数据集,支持通过长格式多模态内容(如Manim动画视频)开展相关任务。该数据集包含240条定理,并按难度等级与学科领域进行分类,以支持结构化基准测试。 ## 数据集详情 <!-- 对数据集的详细说明 --> - **整理方:** Max Ku、Thomas Chong - **语言(自然语言处理):** 英语 - **授权协议:** MIT - **代码仓库:** https://github.com/TIGER-AI-Lab/TheoremExplainAgent - **论文链接:** https://huggingface.co/papers/2502.19400 - **ArXiv预印本:** https://arxiv.org/abs/2502.19400 ## 数据集用途 <!-- 说明数据集的预期使用场景 --> 本数据集旨在通过生成长格式Manim视频的方式,评估大语言模型对数学与科学定理的解释能力。其潜在应用场景包括: 模型评估:用于测评大语言模型在其他多模态内容形式(如文本+多段动画)下的定理理解与解释能力。 ## 数据集结构 <!-- 本节说明数据集字段、划分标准、数据点关联等结构相关信息 --> 本数据集共包含240条定理,按以下维度划分: ### 难度等级 * 简单:80条定理 * 中等:80条定理 * 困难:80条定理 ### 学科领域(每个难度等级下均分拆为对应数量) * 计算机科学:20条定理 * 数学:20条定理 * 物理学:20条定理 * 化学:20条定理 针对每条定理,我们提供了一段“描述文本”,该文本未必完整阐释定理本身,仅作为上下文辅助大语言模型区分应用场景。 ## 数据集构建 <!-- 本节说明数据源 --> 本数据集的定理来源于以下平台: * LibreTexts * OpenStax ## 引用说明 <!-- 若数据集对应学术论文或博文,需在此处提供其引用格式 --> **BibTeX 引用:** bibtex @misc{ku2025theoremexplainagentmultimodalexplanationsllm, title={TheoremExplainAgent: 面向大语言模型定理理解的多模态解释方法}, author={Max Ku, Thomas Chong, Jonathan Leung, Krish Shah, Alvin Yu, Wenhu Chen}, year={2025}, eprint={2502.19400}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2502.19400}, } ## 数据集卡片联系人 联系方式:Max Ku(@vinesmsuic)
提供机构:
maas
创建时间:
2025-02-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作