five

nthakur/mkqa-raft-instruct

收藏
Hugging Face2024-05-16 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/nthakur/mkqa-raft-instruct
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: ar features: - name: prompt dtype: string - name: query_id dtype: string - name: doc_ids sequence: string - name: documents list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string - name: gold_answer sequence: string splits: - name: train num_bytes: 102496608 num_examples: 6500 download_size: 50730638 dataset_size: 102496608 - config_name: de features: - name: prompt dtype: string - name: query_id dtype: string - name: doc_ids sequence: string - name: documents list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string - name: gold_answer sequence: string splits: - name: train num_bytes: 62305522 num_examples: 6500 download_size: 37035774 dataset_size: 62305522 - config_name: en features: - name: prompt dtype: string - name: query_id dtype: string - name: doc_ids sequence: string - name: documents list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string - name: gold_answer sequence: string splits: - name: train num_bytes: 75045834 num_examples: 6500 download_size: 43109259 dataset_size: 75045834 - config_name: es features: - name: prompt dtype: string - name: query_id dtype: string - name: doc_ids sequence: string - name: documents list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string - name: gold_answer sequence: string splits: - name: train num_bytes: 71034084 num_examples: 6500 download_size: 41873672 dataset_size: 71034084 - config_name: fi features: - name: prompt dtype: string - name: query_id dtype: string - name: doc_ids sequence: string - name: documents list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string - name: gold_answer sequence: string splits: - name: train num_bytes: 61394960 num_examples: 6500 download_size: 36958512 dataset_size: 61394960 - config_name: fr features: - name: prompt dtype: string - name: query_id dtype: string - name: doc_ids sequence: string - name: documents list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string - name: gold_answer sequence: string splits: - name: train num_bytes: 61439186 num_examples: 6500 download_size: 35830558 dataset_size: 61439186 - config_name: ja features: - name: prompt dtype: string - name: query_id dtype: string - name: doc_ids sequence: string - name: documents list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string - name: gold_answer sequence: string splits: - name: train num_bytes: 76017085 num_examples: 6500 download_size: 42305829 dataset_size: 76017085 - config_name: ko features: - name: prompt dtype: string - name: query_id dtype: string - name: doc_ids sequence: string - name: documents list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string - name: gold_answer sequence: string splits: - name: train num_bytes: 67618027 num_examples: 6500 download_size: 39510692 dataset_size: 67618027 - config_name: ru features: - name: prompt dtype: string - name: query_id dtype: string - name: doc_ids sequence: string - name: documents list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string - name: gold_answer sequence: string splits: - name: train num_bytes: 113219755 num_examples: 6500 download_size: 56978371 dataset_size: 113219755 - config_name: th features: - name: prompt dtype: string - name: query_id dtype: string - name: doc_ids sequence: string - name: documents list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string - name: gold_answer sequence: string splits: - name: train num_bytes: 139954459 num_examples: 6500 download_size: 56017120 dataset_size: 139954459 - config_name: zh features: - name: prompt dtype: string - name: query_id dtype: string - name: doc_ids sequence: string - name: documents list: - name: docid dtype: string - name: text dtype: string - name: title dtype: string - name: gold_answer sequence: string splits: - name: train num_bytes: 58135559 num_examples: 6500 download_size: 39175049 dataset_size: 58135559 configs: - config_name: ar data_files: - split: train path: ar/train-* - config_name: de data_files: - split: train path: de/train-* - config_name: en data_files: - split: train path: en/train-* - config_name: es data_files: - split: train path: es/train-* - config_name: fi data_files: - split: train path: fi/train-* - config_name: fr data_files: - split: train path: fr/train-* - config_name: ja data_files: - split: train path: ja/train-* - config_name: ko data_files: - split: train path: ko/train-* - config_name: ru data_files: - split: train path: ru/train-* - config_name: th data_files: - split: train path: th/train-* - config_name: zh data_files: - split: train path: zh/train-* ---
提供机构:
nthakur
原始信息汇总

数据集概述

数据集配置

  • config_name: 数据集的语言配置,包括 ar, de, en, es, fi, fr, ja, ko, ru, th, zh。
  • features: 数据集的特征信息,包括:
    • prompt: 数据类型为字符串。
    • query_id: 数据类型为字符串。
    • doc_ids: 数据类型为字符串序列。
    • documents: 数据类型为列表,包含:
      • docid: 数据类型为字符串。
      • text: 数据类型为字符串。
      • title: 数据类型为字符串。
    • gold_answer: 数据类型为字符串序列。

数据集分割

  • split: 数据集分割为训练集。
  • num_bytes: 训练集的数据大小,单位为字节。
  • num_examples: 训练集的样本数量。

数据集大小与下载大小

  • download_size: 数据集的下载大小,单位为字节。
  • dataset_size: 数据集的实际大小,单位为字节。

数据文件路径

  • path: 训练集数据文件的路径,格式为语言代码/train-*。

数据集详细信息

config_name num_bytes (字节) num_examples download_size (字节) dataset_size (字节)
ar 102496608 6500 50730638 102496608
de 62305522 6500 37035774 62305522
en 75045834 6500 43109259 75045834
es 71034084 6500 41873672 71034084
fi 61394960 6500 36958512 61394960
fr 61439186 6500 35830558 61439186
ja 76017085 6500 42305829 76017085
ko 67618027 6500 39510692 67618027
ru 113219755 6500 56978371 113219755
th 139954459 6500 56017120 139954459
zh 58135559 6500 39175049 58135559

以上信息提供了数据集的基本结构、大小和分布情况。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作