granola-entity-questions
收藏魔搭社区2025-12-05 更新2025-04-26 收录
下载链接:
https://modelscope.cn/datasets/google/granola-entity-questions
下载链接
链接失效反馈官方服务:
资源简介:
# GRANOLA Entity Questions Dataset Card
## Dataset details
**Dataset Name**: GRANOLA-EQ (**Gran**ularity **o**f **La**bels **E**ntity **Q**uestions)
**Paper**: [Narrowing the Knowledge Evaluation Gap: Open-Domain Question Answering with Multi-Granularity Answers](https://arxiv.org/abs/2401.04695)
**Abstract**: Factual questions typically can be answered correctly at different levels of granularity. For example, both "August 4, 1961" and "1961" are correct answers to the question "When was Barack Obama born?"". Standard question answering (QA) evaluation protocols, however, do not explicitly take this into account and compare a predicted answer against answers of a single granularity level. In this work, we propose GRANOLA QA, a novel evaluation setting where a predicted answer is evaluated in terms of accuracy and informativeness against a set of multi-granularity answers. We present a simple methodology for enriching existing datasets with multi-granularity answers, and create GRANOLA-EQ, a multi-granularity version of the EntityQuestions dataset. We evaluate a range of decoding methods on GRANOLA-EQ, including a new algorithm, called Decoding with Response Aggregation (DRAG), that is geared towards aligning the response granularity with the model's uncertainty. Our experiments show that large language models with standard decoding tend to generate specific answers, which are often incorrect. In contrast, when evaluated on multi-granularity answers, DRAG yields a nearly 20 point increase in accuracy on average, which further increases for rare entities. Overall, this reveals that standard evaluation and decoding schemes may significantly underestimate the knowledge encapsulated in LMs.
**Language**: English
## Dataset Structure
#### Annotation overview
GRANOLA-EQ was constructed based on a simple and general methodology for augmenting an existing single-granularity QA dataset to the setting of GRANOLA QA, which does not involve any human labor.
This process is based on obtaining additional information about entities present in the original questions and answer(s) from an external knowledge graph (KG), and then using an LLM to form multi-granularity answers conditioned on this information. We apply our methodology on the ENTITYQUESTIONS
We applied this methodology to the test split of the EntityQuestions dataset (Sciavolino et al., 2021), using WikiData (Vrandecic and Krötzsch, 2014) as the KG and PaLM-2-L as the LLM.
The resulting dataset, GRANOLA-EQ, consists of 12K QA examples with an average of 2.9 multigranularity answers per question. A manual analysis of a random subset of the data shows that our automatic procedure yields highly-accurate answers.
#### Dataset overview
Each row contains the original QA example from EntityQuestions, together with additional WikiData metadata, and the generated multi-granular answers. We include an overview of the fields of the dataset:
* *relation*: The relation type
* *question*: The question text
* *question_entity*: The entity in the question
* *question_entity_qid*: The WikiData QID that question_entity was matched to
* *question_entity_description*: The WikiData description for question_entity_qid
* *question_entity_popularity*: The number of Wikipedia pageviews that the Wikipedia page corresponding to question_entity received in September 2023
* *answer*: The answer text (an entity)
* *answer_entity_qid*: The WikiData QID that answer was matched to
* *answer_entity_description*: The WikiData description for answer_entity_qid
* *answer_entity_popularity*: The number of Wikipedia pageviews that the Wikipedia page corresponding to answer received in September 2023
* *score_for_potential_error*: A computed score that is intended to capture the liklihood of an error in the process of extracting descriptions for this row
* *granola_answer_{i}*: The i-th GRANOLA answer
## Usage
Run the following code to load the GRANOLA-EQ dataset. First, log in using a Huggingface access token with Read permissions (can create one [here](https://huggingface.co/settings/tokens)):
```python
from huggingface_hub import notebook_login
notebook_login()
```
Then, load the data using
```python
granola = load_dataset("google/granola-entity-questions", use_auth_token=True)
granola = granola["train"]
pd.DataFrame(granola)
```
## Citation Information
```
@article{yona2024narrowing,
title={Narrowing the knowledge evaluation gap: Open-domain question answering with multi-granularity answers},
author={Yona, Gal and Aharoni, Roee and Geva, Mor},
journal={arXiv preprint arXiv:2401.04695},
year={2024}
}
```
# GRANOLA实体问题数据集卡片
## 数据集详情
**数据集名称**:GRANOLA-EQ(全称:Granularity of Labels Entity Questions,即标签粒度实体问题数据集)
**论文**:[缩小知识评估差距:基于多粒度答案的开放域问答](https://arxiv.org/abs/2401.04695)
**摘要**:事实类问题通常可在不同粒度层级下得到正确解答。例如,对于“巴拉克·奥巴马生于何时?”这一问题,“1961年8月4日”与“1961年”均为正确答案。然而,标准问答(QA)评估协议并未明确考虑这一情况,仅将模型预测答案与单一粒度层级的参考答案进行对比。本研究提出GRANOLA QA这一全新评估设置,将模型预测答案与多粒度参考答案集合进行对比,从准确率与信息丰富度两个维度进行评估。我们提出了一种简单的通用方法,可为现有数据集添加多粒度答案,并基于EntityQuestions数据集构建了多粒度版本GRANOLA-EQ。我们在GRANOLA-EQ上对多种解码方法进行了评估,其中包括一种名为响应聚合解码(Decoding with Response Aggregation,DRAG)的全新算法,该算法旨在将回复粒度与模型的不确定性进行对齐。实验结果表明,采用标准解码方式的大语言模型(Large Language Model, LLM)通常会生成过于具体的答案,而这类答案往往存在错误。与之相反,在基于多粒度答案进行评估时,DRAG算法的平均准确率提升了近20个百分点,针对稀有实体的性能提升更为显著。综上,这一结果表明标准评估与解码方案可能大幅低估了大语言模型所蕴含的知识。
**语言**:英语
## 数据集结构
#### 标注概述
GRANOLA-EQ基于一套简单通用的方法构建而成,可将现有单一粒度问答数据集升级至GRANOLA QA评估设置,且整个过程无需人工参与。该方法的核心流程为:从外部知识图谱(Knowledge Graph, KG)中获取原始问题与答案中涉及实体的额外信息,随后基于大语言模型(Large Language Model, LLM)结合这些信息生成多粒度答案。我们将该方法应用于EntityQuestions数据集的测试划分(Sciavolino等人,2021),采用维基数据(WikiData,Vrandecic与Krötzsch,2014)作为知识图谱,PaLM-2-L作为大语言模型。最终构建的GRANOLA-EQ数据集包含12000条问答样本,每个问题平均对应2.9条多粒度答案。对随机抽取的数据子集进行人工分析后发现,我们的自动生成流程可得到高精度的答案。
#### 数据集概览
GRANOLA-EQ的每一行均包含EntityQuestions数据集的原始问答样本,额外添加了维基数据元数据,以及生成的多粒度答案。下文列出了数据集的字段说明:
* *relation*:关系类型
* *question*:问题文本
* *question_entity*:问题中的实体
* *question_entity_qid*:与`question_entity`匹配的维基数据QID
* *question_entity_description*:`question_entity_qid`对应的维基数据描述
* *question_entity_popularity*:2023年9月,`question_entity`对应的维基百科页面的浏览量
* *answer*:答案文本(实体)
* *answer_entity_qid*:与`answer`匹配的维基数据QID
* *answer_entity_description*:`answer_entity_qid`对应的维基数据描述
* *answer_entity_popularity*:2023年9月,`answer`对应的维基百科页面的浏览量
* *score_for_potential_error*:用于衡量该行描述提取流程出错概率的计算得分
* *granola_answer_{i}*:第i条GRANOLA答案
## 使用方法
运行以下代码即可加载GRANOLA-EQ数据集。首先,使用具备读取权限的Huggingface访问令牌登录(可在此处[创建令牌](https://huggingface.co/settings/tokens)):
python
from huggingface_hub import notebook_login
notebook_login()
随后,通过以下代码加载数据:
python
granola = load_dataset("google/granola-entity-questions", use_auth_token=True)
granola = granola["train"]
pd.DataFrame(granola)
## 引用信息
@article{yona2024narrowing,
title={Narrowing the knowledge evaluation gap: Open-domain question answering with multi-granularity answers},
author={Yona, Gal and Aharoni, Roee and Geva, Mor},
journal={arXiv preprint arXiv:2401.04695},
year={2024}
}
提供机构:
maas
创建时间:
2025-04-21



