TIGER-Lab/LongRAG

Name: TIGER-Lab/LongRAG
Creator: TIGER-Lab
Published: 2024-06-26 13:26:27
License: 暂无描述

Hugging Face2024-06-26 更新2024-06-22 收录

下载链接：

https://hf-mirror.com/datasets/TIGER-Lab/LongRAG

下载链接

链接失效反馈

官方服务：

资源简介：

LongRAG框架包含多个数据集，用于增强检索增强生成（RAG）模型的性能。nq和hotpot_qa数据集分别用于NQ和HotpotQA数据集的检索输出和读者输入。nq_corpus和hotpot_qa_corpus是用于检索的语料库，分别基于2018年12月和2017年10月的维基百科数据。answer_extract_example数据集包含用于从长答案中提取短答案的示例。nq_wiki和hotpot_qa_wiki是处理后的维基百科数据，分别基于2018年12月和2017年10月的维基百科数据。

This dataset includes multiple configurations for the LongRAG framework, involving retrieval output and reader input for the NQ and HotpotQA datasets, as well as corpus data used for these tasks. Additionally, it includes processed Wikipedia data used for these tasks. Each configuration details the types of data fields, their formats, and their purpose within the LongRAG framework.

提供机构：

TIGER-Lab

原始信息汇总

数据集概述

数据集配置

answer_extract_example
- 特征：
  - question: 字符串类型
  - answers: 字符串序列
  - short_answer: 字符串类型
  - long_answer: 字符串类型
- 分割：
  - train: 2239 字节, 8 个样本
- 下载大小：5937 字节
- 数据集大小：2239 字节
hotpot_qa
- 特征：
  - query_id: 64 位整数
  - query: 字符串类型
  - answer: 字符串序列
  - sp: 字符串序列
  - type: 字符串类型
  - context_titles: 字符串序列
  - context: 字符串类型
- 分割：
  - full: 1118201401 字节, 7405 个样本
  - subset_1000: 151675133 字节, 1000 个样本
  - subset_100: 15173459 字节, 100 个样本
- 下载大小：683309128 字节
- 数据集大小：1285049993 字节
hotpot_qa_corpus
- 特征：
  - corpus_id: 64 位整数
  - titles: 字符串序列
  - text: 字符串类型
- 分割：
  - train: 1671047802 字节, 509493 个样本
- 下载大小：880955518 字节
- 数据集大小：1671047802 字节
nq
- 特征：
  - query_id: 字符串类型
  - query: 字符串类型
  - answer: 字符串序列
  - context_titles: 字符串序列
  - context: 字符串类型
- 分割：
  - full: 379137147 字节, 3610 个样本
  - subset_1000: 106478843 字节, 1000 个样本
  - subset_100: 9986104 字节, 100 个样本
- 下载大小：283296797 字节
- 数据集大小：495602094 字节
nq_corpus
- 特征：
  - corpus_id: 64 位整数
  - titles: 字符串序列
  - text: 字符串类型
- 分割：
  - train: 12054791599 字节, 604351 个样本
- 下载大小：6942402166 字节
- 数据集大小：12054791599 字节

数据文件配置

answer_extract_example
- 分割：
  - train: answer_extract_example/train-*
hotpot_qa
- 分割：
  - full: hotpot_qa/full-*
  - subset_1000: hotpot_qa/subset_1000-*
  - subset_100: hotpot_qa/subset_100-*
hotpot_qa_corpus
- 分割：
  - train: hotpot_qa_corpus/train-*
nq
- 分割：
  - full: nq/full-*
  - subset_1000: nq/subset_1000-*
  - subset_100: nq/subset_100-*
nq_corpus
- 分割：
  - train: nq_corpus/train-*

搜集汇总

数据集介绍

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集