miracl/hagrid
收藏Hugging Face2023-08-01 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/miracl/hagrid
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
language:
- en
pretty_name: HAGRID
size_categories:
- 1K<n<10K
---
# HAGRID: A Human-LLM Collaborative Dataset for Generative Information-seeking with Attribution
HAGRID (**H**uman-in-the-loop **A**ttributable **G**enerative **R**etrieval for **I**nformation-seeking **D**ataset)
is a dataset for generative information-seeking scenarios.
It is constructed on top of MIRACL 🌍🙌🌏, an information retrieval dataset that consists of queries along with a set of manually labelled relevant passages (quotes).
## Dataset Structure
To load the dataset:
```python
import datasets
hagrid = datasets.load_dataset("miracl/hagrid", split="train")
print(hagrid[0])
```
It would show:
```json
{
'query': ...,
'query_id': ...,
'quotes': [{ # a list of quotes that are manually labeled as relevant to the query
'docid': ...,
'idx': ...,
'text': ...
}, ...]
'answers': [{
'answer': ..., # the complete answer generated by LLM
'attributable': 1/0/None, # 1: attributable; 0: unattributable; None: unlabeled
'informative': 1/0, # 1: informative; 0: uninformative
'sentences': [{ # answers split into sentences
'index': ...,
'attributable': 0/1/None,
'informative': 0/1/None,
'text': ...,
}, ...]
}, ...]
}
```
提供机构:
miracl
原始信息汇总
HAGRID数据集概述
基本信息
- 许可证: Apache-2.0
- 语言: 英语
- 数据集名称: HAGRID
- 数据集大小: 1K<n<10K
数据集描述
HAGRID(Human-in-the-loop Attributable Generative Retrieval for Information-seeking Dataset)是一个用于生成式信息检索场景的数据集。该数据集构建在MIRACL之上,MIRACL是一个包含查询及其一组手动标记的相关段落(引用)的信息检索数据集。
数据集结构
数据集的结构如下:
- 查询(query)
- 查询ID(query_id)
- 引用列表(quotes):手动标记为与查询相关的引用列表,每个引用包含文档ID(docid)、索引(idx)和文本(text)。
- 答案列表(answers):由LLM生成的完整答案,每个答案包含:
- 答案文本(answer)
- 可归属性(attributable):1表示可归属,0表示不可归属,None表示未标记
- 信息性(informative):1表示信息性,0表示非信息性
- 句子列表(sentences):答案分割成的句子,每个句子包含索引(index)、可归属性(attributable)、信息性(informative)和文本(text)。



