TheoremQA

Name: TheoremQA
Creator: maas
Published: 2025-12-31 23:35:02
License: 暂无描述

魔搭社区2025-12-31 更新2024-05-15 收录

下载链接：

https://modelscope.cn/datasets/opencompass/TheoremQA

下载链接

链接失效反馈

官方服务：

资源简介：

## Introduction We propose the first question-answering dataset driven by STEM theorems. We annotated 800 QA pairs covering 350+ theorems spanning across Math, EE&CS, Physics and Finance. The dataset is collected by human experts with very high quality. We provide the dataset as a new benchmark to test the limit of large language models to apply theorems to solve challenging university-level questions. We provide a pipeline in the following to prompt LLMs and evaluate their outputs with WolframAlpha. ## How to use TheoremQA ``` from datasets import load_dataset dataset = load_dataset("wenhu/TheoremQA") for d in dataset['test']: print(d) ``` ## Arxiv Paper: https://arxiv.org/abs/2305.12524 ## Code https://github.com/wenhuchen/TheoremQA/tree/main

## 引言本工作提出了首个由理工科（STEM，Science, Technology, Engineering, Mathematics）定理驱动的问答数据集。我们标注了800组问答对，覆盖数学、电气与计算机工程（EE&CS）、物理及金融领域的350余个定理。该数据集由领域专家人工构建，标注质量极高。我们将该数据集作为全新基准，用于评测大语言模型（Large Language Model, LLM）运用定理求解高难度大学层级问题的能力上限。下文将提供一套流程，用于为大语言模型生成提示词，并借助WolframAlpha对模型输出结果进行评测。 ## 定理QA（TheoremQA）使用方法 from datasets import load_dataset dataset = load_dataset("wenhu/TheoremQA") for d in dataset['test']: print(d) ## 学术论文 https://arxiv.org/abs/2305.12524 ## 代码仓库 https://github.com/wenhuchen/TheoremQA/tree/main

提供机构：

maas

创建时间：

2024-04-24

5,000+

优质数据集

54 个

任务类型

进入经典数据集