TheoremQA
收藏魔搭社区2025-12-31 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/opencompass/TheoremQA
下载链接
链接失效反馈官方服务:
资源简介:
## Introduction
We propose the first question-answering dataset driven by STEM theorems. We annotated 800 QA pairs covering 350+ theorems spanning across Math, EE&CS, Physics and Finance. The dataset is collected by human experts with very high quality. We provide the dataset as a new benchmark to test the limit of large language models to apply theorems to solve challenging university-level questions. We provide a pipeline in the following to prompt LLMs and evaluate their outputs with WolframAlpha.
## How to use TheoremQA
```
from datasets import load_dataset
dataset = load_dataset("wenhu/TheoremQA")
for d in dataset['test']:
print(d)
```
## Arxiv Paper:
https://arxiv.org/abs/2305.12524
## Code
https://github.com/wenhuchen/TheoremQA/tree/main
## 引言
本工作提出了首个由理工科(STEM,Science, Technology, Engineering, Mathematics)定理驱动的问答数据集。我们标注了800组问答对,覆盖数学、电气与计算机工程(EE&CS)、物理及金融领域的350余个定理。该数据集由领域专家人工构建,标注质量极高。我们将该数据集作为全新基准,用于评测大语言模型(Large Language Model, LLM)运用定理求解高难度大学层级问题的能力上限。下文将提供一套流程,用于为大语言模型生成提示词,并借助WolframAlpha对模型输出结果进行评测。
## 定理QA(TheoremQA)使用方法
from datasets import load_dataset
dataset = load_dataset("wenhu/TheoremQA")
for d in dataset['test']:
print(d)
## 学术论文
https://arxiv.org/abs/2305.12524
## 代码仓库
https://github.com/wenhuchen/TheoremQA/tree/main
提供机构:
maas
创建时间:
2024-04-24



