google/granola-entity-questions

Name: google/granola-entity-questions
Creator: google
Published: 2024-08-01 06:13:17
License: 暂无描述

Hugging Face2024-08-01 更新2025-04-08 收录

下载链接：

https://hf-mirror.com/datasets/google/granola-entity-questions

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nd-4.0 task_categories: - question-answering language: - en pretty_name: GRANOLA-EQ size_categories: - 10K<n<100K --- # GRANOLA Entity Questions Dataset Card ## Dataset details **Dataset Name**: GRANOLA-EQ (**Gran**ularity **o**f **La**bels **E**ntity **Q**uestions) **Paper**: [Narrowing the Knowledge Evaluation Gap: Open-Domain Question Answering with Multi-Granularity Answers](https://arxiv.org/abs/2401.04695) **Abstract**: Factual questions typically can be answered correctly at different levels of granularity. For example, both "August 4, 1961" and "1961" are correct answers to the question "When was Barack Obama born?"". Standard question answering (QA) evaluation protocols, however, do not explicitly take this into account and compare a predicted answer against answers of a single granularity level. In this work, we propose GRANOLA QA, a novel evaluation setting where a predicted answer is evaluated in terms of accuracy and informativeness against a set of multi-granularity answers. We present a simple methodology for enriching existing datasets with multi-granularity answers, and create GRANOLA-EQ, a multi-granularity version of the EntityQuestions dataset. We evaluate a range of decoding methods on GRANOLA-EQ, including a new algorithm, called Decoding with Response Aggregation (DRAG), that is geared towards aligning the response granularity with the model's uncertainty. Our experiments show that large language models with standard decoding tend to generate specific answers, which are often incorrect. In contrast, when evaluated on multi-granularity answers, DRAG yields a nearly 20 point increase in accuracy on average, which further increases for rare entities. Overall, this reveals that standard evaluation and decoding schemes may significantly underestimate the knowledge encapsulated in LMs. **Language**: English ## Dataset Structure #### Annotation overview GRANOLA-EQ was constructed based on a simple and general methodology for augmenting an existing single-granularity QA dataset to the setting of GRANOLA QA, which does not involve any human labor. This process is based on obtaining additional information about entities present in the original questions and answer(s) from an external knowledge graph (KG), and then using an LLM to form multi-granularity answers conditioned on this information. We apply our methodology on the ENTITYQUESTIONS We applied this methodology to the test split of the EntityQuestions dataset (Sciavolino et al., 2021), using WikiData (Vrandecic and Krötzsch, 2014) as the KG and PaLM-2-L as the LLM. The resulting dataset, GRANOLA-EQ, consists of 12K QA examples with an average of 2.9 multigranularity answers per question. A manual analysis of a random subset of the data shows that our automatic procedure yields highly-accurate answers. #### Dataset overview Each row contains the original QA example from EntityQuestions, together with additional WikiData metadata, and the generated multi-granular answers. We include an overview of the fields of the dataset: * *relation*: The relation type * *question*: The question text * *question_entity*: The entity in the question * *question_entity_qid*: The WikiData QID that question_entity was matched to * *question_entity_description*: The WikiData description for question_entity_qid * *question_entity_popularity*: The number of Wikipedia pageviews that the Wikipedia page corresponding to question_entity received in September 2023 * *answer*: The answer text (an entity) * *answer_entity_qid*: The WikiData QID that answer was matched to * *answer_entity_description*: The WikiData description for answer_entity_qid * *answer_entity_popularity*: The number of Wikipedia pageviews that the Wikipedia page corresponding to answer received in September 2023 * *score_for_potential_error*: A computed score that is intended to capture the liklihood of an error in the process of extracting descriptions for this row * *granola_answer_{i}*: The i-th GRANOLA answer ## Usage Run the following code to load the GRANOLA-EQ dataset. First, log in using a Huggingface access token with Read permissions (can create one [here](https://huggingface.co/settings/tokens)): ```python from huggingface_hub import notebook_login notebook_login() ``` Then, load the data using ```python granola = load_dataset("google/granola-entity-questions", use_auth_token=True) granola = granola["train"] pd.DataFrame(granola) ``` ## Citation Information ``` @article{yona2024narrowing, title={Narrowing the knowledge evaluation gap: Open-domain question answering with multi-granularity answers}, author={Yona, Gal and Aharoni, Roee and Geva, Mor}, journal={arXiv preprint arXiv:2401.04695}, year={2024} } ```

提供机构：

google

5,000+

优质数据集

54 个

任务类型

进入经典数据集