tyzhu/lmind_nq_train6000_eval6489_v1_docidx_v3
收藏Hugging Face2024-06-04 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/tyzhu/lmind_nq_train6000_eval6489_v1_docidx_v3
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: answers
struct:
- name: answer_start
sequence: 'null'
- name: text
sequence: string
- name: inputs
dtype: string
- name: targets
dtype: string
splits:
- name: train_qa
num_bytes: 697367
num_examples: 6000
- name: train_ic_qa
num_bytes: 4540536
num_examples: 6000
- name: train_recite_qa
num_bytes: 4546536
num_examples: 6000
- name: eval_qa
num_bytes: 752802
num_examples: 6489
- name: eval_ic_qa
num_bytes: 4906186
num_examples: 6489
- name: eval_recite_qa
num_bytes: 4912675
num_examples: 6489
- name: all_docs
num_bytes: 7126313
num_examples: 10925
- name: all_docs_eval
num_bytes: 7125701
num_examples: 10925
- name: train
num_bytes: 4161737.5480059083
num_examples: 10925
- name: validation
num_bytes: 752802
num_examples: 6489
download_size: 25667885
dataset_size: 39522655.54800591
configs:
- config_name: default
data_files:
- split: train_qa
path: data/train_qa-*
- split: train_ic_qa
path: data/train_ic_qa-*
- split: train_recite_qa
path: data/train_recite_qa-*
- split: eval_qa
path: data/eval_qa-*
- split: eval_ic_qa
path: data/eval_ic_qa-*
- split: eval_recite_qa
path: data/eval_recite_qa-*
- split: all_docs
path: data/all_docs-*
- split: all_docs_eval
path: data/all_docs_eval-*
- split: train
path: data/train-*
- split: validation
path: data/validation-*
---
This dataset is primarily designed for question answering tasks, featuring multiple characteristics and splits. The main features include answers (containing the starting position and text of the answer), inputs, and targets. The dataset is divided into multiple splits, such as training question answering, training interactive question answering, etc., each with corresponding byte count and number of examples. The dataset configuration includes a default configuration, with each configuration having corresponding data file paths.
提供机构:
tyzhu
原始信息汇总
数据集概述
数据集特征
- answers:
- answer_start: 无序列
- text: 字符串序列
- inputs: 字符串类型
- targets: 字符串类型
数据集分割
- train_qa:
- 字节数: 697367
- 样本数: 6000
- train_ic_qa:
- 字节数: 4540536
- 样本数: 6000
- train_recite_qa:
- 字节数: 4546536
- 样本数: 6000
- eval_qa:
- 字节数: 752802
- 样本数: 6489
- eval_ic_qa:
- 字节数: 4906186
- 样本数: 6489
- eval_recite_qa:
- 字节数: 4912675
- 样本数: 6489
- all_docs:
- 字节数: 7126313
- 样本数: 10925
- all_docs_eval:
- 字节数: 7125701
- 样本数: 10925
- train:
- 字节数: 4161737.5480059083
- 样本数: 10925
- validation:
- 字节数: 752802
- 样本数: 6489
数据集大小
- 下载大小: 25667885 字节
- 数据集大小: 39522655.54800591 字节
配置
- config_name: default
- data_files:
- train_qa: data/train_qa-*
- train_ic_qa: data/train_ic_qa-*
- train_recite_qa: data/train_recite_qa-*
- eval_qa: data/eval_qa-*
- eval_ic_qa: data/eval_ic_qa-*
- eval_recite_qa: data/eval_recite_qa-*
- all_docs: data/all_docs-*
- all_docs_eval: data/all_docs_eval-*
- train: data/train-*
- validation: data/validation-*
- data_files:



