KBQA-Agent
收藏魔搭社区2025-11-27 更新2025-07-05 收录
下载链接:
https://modelscope.cn/datasets/osunlp/KBQA-Agent
下载链接
链接失效反馈官方服务:
资源简介:
**Introduction**
In traditional knowledge base question answering (KBQA) methods, semantic parsing plays a crucial role. It requires a semantic parser to be extensively trained on a vast dataset of labeled examples, typically consisting of question-answer or question-program pairs.
However, the rise of LLMs has shifted this paradigm. LLMs excel in learning from few (or even zero) in-context examples. They utilize natural language as a general vehicle of thought, enabling them to actively navigate and interact with KBs using auxiliary tools, without the need for training on comprehensive datasets. This advance suggests LLMs can sidestep the earlier limitations and eliminate the dependency on extensive, high-coverage training data.
Such a paradigm is usually encapsulated in the term "language agent" or "LLM agent". Existing KBQA datasets may not be ideal to evaluate this new paradigm for two reasons: 1) Many questions are single-hop queries over the KB, which fails to sufficiently challenge the capabilities of LLMs, and 2) Established KBQA benchmarks contain tens of thousands of test questions. Evaluating the most capable models like GPT-4 on so many questions would be extremely costly and often unnecessary.
As a result, we curate KBQA-Agent to offer a more targeted KBQA evaluation for language agents. KBQA-Agent contains 500 complex questions over Freebase from three existing KBQA datasets: GrailQA, ComplexWebQuestions, and GraphQuestions. To further support future research, we also provide the ground truth action sequence (i.e., tool invocations) for the language agent to take to answer each question.
**Split**
KBQA-Agent targets a training-free setting (we used a one-shot demo in our original experiments), so there is only one split of the test set.
**Dataset Structure**
- **qid:** The unique id of a question
- **s-expression:** The ground truth logical form, where we derive the ground truth actions from
- **answer:** The list of answer entities
- **question:** The input question
- **actions:** The ground truth sequence of actions, derived from the s-expression
- **entities:** The topic entities mentioned in the question
- **source:** The source of the question (e.g., GrailQA)
**Citation**
If our paper or related resources prove valuable to your research, we kindly ask for citation. Please feel free to contact us with any inquiries.
```
@article{Gu2024Middleware,
author = {Yu Gu, Yiheng Shu, Hao Yu, Xiao Liu, Yuxiao Dong, Jie Tang, Jayanth Srinivasa, Hugo Latapie, Yu Su},
title = {Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments},
journal = {arXiv preprint arXiv: 2402.14672},
year = {2024}
}
```
Please also cite original sources of KBQA-Agent:
**GrailQA:**
```
@inproceedings{grailqa,
author = {Yu Gu, Sue Kase, Michelle Vanni, Brian M. Sadler, Percy Liang, Xifeng Yan, Yu Su},
title = {Beyond {I.I.D.:} Three Levels of Generalization for Question Answering on Knowledge Bases},
booktitle = {WWW '21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 19-23, 2021},
year = {2021}
}
```
**ComplexWebQ:**
```
@inproceedings{cwq,
author = {Alon Talmor, Jonathan Berant},
title = {The Web as a Knowledge-Base for Answering Complex Questions},
booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers)},
year = {2018}
}
```
**GraphQuestions:**
```
@inproceedings{graphq,
author = {Yu Su, Huan Sun, Brian M. Sadler, Mudhakar Srivatsa, Izzeddin Gur, Zenghui Yan, Xifeng Yan},
title = {On Generating Characteristic-rich Question Sets for QA Evaluation},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016},
year = {2016}
}
```
**引言**
在传统的知识库问答(Knowledge Base Question Answering, KBQA)方法中,语义解析扮演着核心角色。这类方法要求语义解析器在大规模标注示例数据集上完成充分训练,此类数据集通常由问答对或问题-程序对构成。
然而,大语言模型(Large Language Model, LLM)的兴起重塑了这一研究范式。LLM擅长从少量(乃至零个)上下文示例中进行学习,它们将自然语言作为通用的思维媒介,能够借助辅助工具主动遍历并与知识库(Knowledge Base, KB)交互,无需在大规模数据集上完成训练。这一进展表明,LLM能够规避过往的局限,摆脱对大规模、高覆盖度训练数据的依赖。
这类范式通常被统称为“语言智能体”或“大语言模型智能体”。现有的KBQA数据集或许并不适配这一新范式的评估,原因有二:其一,多数问题为针对知识库的单跳查询,无法充分检验LLM的能力;其二,已有的KBQA基准数据集包含数万条测试问题,对GPT-4这类顶尖模型在如此体量的问题上开展评估,成本高昂且通常并无必要。
为此,我们构建了KBQA-Agent数据集,以针对语言智能体开展更具针对性的KBQA评估。KBQA-Agent从三个现有KBQA数据集(GrailQA、ComplexWebQuestions与GraphQuestions)中选取了500条针对Freebase的复杂问题。为进一步支撑后续研究,我们还为每条问题提供了语言智能体用于解答的基准真值动作序列(即工具调用序列)。
**划分方式**
KBQA-Agent面向无训练设置(我们在原始实验中采用了单样本演示),因此仅包含一个测试集划分。
**数据集结构**
- **qid:** 问题的唯一标识符
- **s-expression:** 基准真值逻辑形式,我们从中导出真实动作序列
- **answer:** 答案实体列表
- **question:** 输入问题
- **actions:** 从s-expression导出的基准真值动作序列
- **entities:** 问题中提及的主题实体
- **source:** 问题的来源(例如GrailQA)
**引用说明**
若本论文或相关资源对你的研究有所助益,恳请你进行引用。如有任何疑问,欢迎随时与我们联系。
@article{Gu2024Middleware,
author = {Yu Gu, Yiheng Shu, Hao Yu, Xiao Liu, Yuxiao Dong, Jie Tang, Jayanth Srinivasa, Hugo Latapie, Yu Su},
title = {Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments},
journal = {arXiv preprint arXiv: 2402.14672},
year = {2024}
}
同时请一并引用KBQA-Agent的原始数据集来源:
**GrailQA:**
@inproceedings{grailqa,
author = {Yu Gu, Sue Kase, Michelle Vanni, Brian M. Sadler, Percy Liang, Xifeng Yan, Yu Su},
title = {Beyond {I.I.D.:} Three Levels of Generalization for Question Answering on Knowledge Bases},
booktitle = {WWW '21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 19-23, 2021},
year = {2021}
}
**ComplexWebQ:**
@inproceedings{cwq,
author = {Alon Talmor, Jonathan Berant},
title = {The Web as a Knowledge-Base for Answering Complex Questions},
booktitle = {Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers)},
year = {2018}
}
**GraphQuestions:**
@inproceedings{graphq,
author = {Yu Su, Huan Sun, Brian M. Sadler, Mudhakar Srivatsa, Izzeddin Gur, Zenghui Yan, Xifeng Yan},
title = {On Generating Characteristic-rich Question Sets for QA Evaluation},
booktitle = {Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016},
year = {2016}
}
提供机构:
maas
创建时间:
2025-07-04



