five

tyzhu/lmind_nq_train6000_eval6489_v1_docidx_v3

收藏
Hugging Face2024-06-04 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/tyzhu/lmind_nq_train6000_eval6489_v1_docidx_v3
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: answers struct: - name: answer_start sequence: 'null' - name: text sequence: string - name: inputs dtype: string - name: targets dtype: string splits: - name: train_qa num_bytes: 697367 num_examples: 6000 - name: train_ic_qa num_bytes: 4540536 num_examples: 6000 - name: train_recite_qa num_bytes: 4546536 num_examples: 6000 - name: eval_qa num_bytes: 752802 num_examples: 6489 - name: eval_ic_qa num_bytes: 4906186 num_examples: 6489 - name: eval_recite_qa num_bytes: 4912675 num_examples: 6489 - name: all_docs num_bytes: 7126313 num_examples: 10925 - name: all_docs_eval num_bytes: 7125701 num_examples: 10925 - name: train num_bytes: 4161737.5480059083 num_examples: 10925 - name: validation num_bytes: 752802 num_examples: 6489 download_size: 25667885 dataset_size: 39522655.54800591 configs: - config_name: default data_files: - split: train_qa path: data/train_qa-* - split: train_ic_qa path: data/train_ic_qa-* - split: train_recite_qa path: data/train_recite_qa-* - split: eval_qa path: data/eval_qa-* - split: eval_ic_qa path: data/eval_ic_qa-* - split: eval_recite_qa path: data/eval_recite_qa-* - split: all_docs path: data/all_docs-* - split: all_docs_eval path: data/all_docs_eval-* - split: train path: data/train-* - split: validation path: data/validation-* ---

This dataset is primarily designed for question answering tasks, featuring multiple characteristics and splits. The main features include answers (containing the starting position and text of the answer), inputs, and targets. The dataset is divided into multiple splits, such as training question answering, training interactive question answering, etc., each with corresponding byte count and number of examples. The dataset configuration includes a default configuration, with each configuration having corresponding data file paths.
提供机构:
tyzhu
原始信息汇总

数据集概述

数据集特征

  • answers:
    • answer_start: 无序列
    • text: 字符串序列
  • inputs: 字符串类型
  • targets: 字符串类型

数据集分割

  • train_qa:
    • 字节数: 697367
    • 样本数: 6000
  • train_ic_qa:
    • 字节数: 4540536
    • 样本数: 6000
  • train_recite_qa:
    • 字节数: 4546536
    • 样本数: 6000
  • eval_qa:
    • 字节数: 752802
    • 样本数: 6489
  • eval_ic_qa:
    • 字节数: 4906186
    • 样本数: 6489
  • eval_recite_qa:
    • 字节数: 4912675
    • 样本数: 6489
  • all_docs:
    • 字节数: 7126313
    • 样本数: 10925
  • all_docs_eval:
    • 字节数: 7125701
    • 样本数: 10925
  • train:
    • 字节数: 4161737.5480059083
    • 样本数: 10925
  • validation:
    • 字节数: 752802
    • 样本数: 6489

数据集大小

  • 下载大小: 25667885 字节
  • 数据集大小: 39522655.54800591 字节

配置

  • config_name: default
    • data_files:
      • train_qa: data/train_qa-*
      • train_ic_qa: data/train_ic_qa-*
      • train_recite_qa: data/train_recite_qa-*
      • eval_qa: data/eval_qa-*
      • eval_ic_qa: data/eval_ic_qa-*
      • eval_recite_qa: data/eval_recite_qa-*
      • all_docs: data/all_docs-*
      • all_docs_eval: data/all_docs_eval-*
      • train: data/train-*
      • validation: data/validation-*
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作