five

nthakur/nomiracl-raft-instruct-mistral

收藏
Hugging Face2024-04-11 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/nthakur/nomiracl-raft-instruct-mistral
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: ar features: - name: output list: - name: model dtype: string - name: output dtype: string - name: prompt dtype: string - name: query_id dtype: string - name: doc_ids sequence: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' splits: - name: train num_bytes: 2735079 num_examples: 303 download_size: 1140409 dataset_size: 2735079 - config_name: bn features: - name: output list: - name: model dtype: string - name: output dtype: string - name: prompt dtype: string - name: query_id dtype: string - name: doc_ids sequence: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' splits: - name: train num_bytes: 10136389 num_examples: 821 download_size: 3196811 dataset_size: 10136389 - config_name: en features: - name: output list: - name: model dtype: string - name: output dtype: string - name: prompt dtype: string - name: query_id dtype: string - name: doc_ids sequence: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' splits: - name: train num_bytes: 1520285 num_examples: 255 download_size: 737245 dataset_size: 1520285 - config_name: es features: - name: output list: - name: model dtype: string - name: output dtype: string - name: prompt dtype: string - name: query_id dtype: string - name: doc_ids sequence: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' splits: - name: train num_bytes: 1702736 num_examples: 325 download_size: 800383 dataset_size: 1702736 - config_name: fa features: - name: output list: - name: model dtype: string - name: output dtype: string - name: prompt dtype: string - name: query_id dtype: string - name: doc_ids sequence: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' splits: - name: train num_bytes: 10060118 num_examples: 1397 download_size: 3672376 dataset_size: 10060118 - config_name: fi features: - name: output list: - name: model dtype: string - name: output dtype: string - name: prompt dtype: string - name: query_id dtype: string - name: doc_ids sequence: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' splits: - name: train num_bytes: 613801 num_examples: 112 download_size: 338235 dataset_size: 613801 - config_name: fr features: - name: output list: - name: model dtype: string - name: output dtype: string - name: prompt dtype: string - name: query_id dtype: string - name: doc_ids sequence: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' splits: - name: train num_bytes: 5131163 num_examples: 1005 download_size: 2330304 dataset_size: 5131163 - config_name: hi features: - name: output list: - name: model dtype: string - name: output dtype: string - name: prompt dtype: string - name: query_id dtype: string - name: doc_ids sequence: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' splits: - name: train num_bytes: 19409785 num_examples: 1845 download_size: 6047001 dataset_size: 19409785 - config_name: id features: - name: output list: - name: model dtype: string - name: output dtype: string - name: prompt dtype: string - name: query_id dtype: string - name: doc_ids sequence: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' splits: - name: train num_bytes: 2916188 num_examples: 545 download_size: 1369968 dataset_size: 2916188 - config_name: ja features: - name: output list: - name: model dtype: string - name: output dtype: string - name: prompt dtype: string - name: query_id dtype: string - name: doc_ids sequence: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' splits: - name: train num_bytes: 2253271 num_examples: 367 download_size: 1008667 dataset_size: 2253271 - config_name: ko features: - name: output list: - name: model dtype: string - name: output dtype: string - name: prompt dtype: string - name: query_id dtype: string - name: doc_ids sequence: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' splits: - name: train num_bytes: 8068021 num_examples: 1453 download_size: 3730192 dataset_size: 8068021 - config_name: ru features: - name: output list: - name: model dtype: string - name: output dtype: string - name: prompt dtype: string - name: query_id dtype: string - name: doc_ids sequence: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' splits: - name: train num_bytes: 3433771 num_examples: 361 download_size: 1423396 dataset_size: 3433771 - config_name: sw features: - name: output list: - name: model dtype: string - name: output dtype: string - name: prompt dtype: string - name: query_id dtype: string - name: doc_ids sequence: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' splits: - name: train num_bytes: 2770975 num_examples: 576 download_size: 1278519 dataset_size: 2770975 - config_name: te features: - name: output list: - name: model dtype: string - name: output dtype: string - name: prompt dtype: string - name: query_id dtype: string - name: doc_ids sequence: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' splits: - name: train num_bytes: 1619283 num_examples: 167 download_size: 557845 dataset_size: 1619283 - config_name: th features: - name: output list: - name: model dtype: string - name: output dtype: string - name: prompt dtype: string - name: query_id dtype: string - name: doc_ids sequence: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' splits: - name: train num_bytes: 6090937 num_examples: 543 download_size: 1914954 dataset_size: 6090937 - config_name: zh features: - name: output list: - name: model dtype: string - name: output dtype: string - name: prompt dtype: string - name: query_id dtype: string - name: doc_ids sequence: string - name: positive_ids sequence: string - name: negative_ids sequence: 'null' splits: - name: train num_bytes: 5992472 num_examples: 1409 download_size: 2950247 dataset_size: 5992472 configs: - config_name: ar data_files: - split: train path: ar/train-* - config_name: bn data_files: - split: train path: bn/train-* - config_name: en data_files: - split: train path: en/train-* - config_name: es data_files: - split: train path: es/train-* - config_name: fa data_files: - split: train path: fa/train-* - config_name: fi data_files: - split: train path: fi/train-* - config_name: fr data_files: - split: train path: fr/train-* - config_name: hi data_files: - split: train path: hi/train-* - config_name: id data_files: - split: train path: id/train-* - config_name: ja data_files: - split: train path: ja/train-* - config_name: ko data_files: - split: train path: ko/train-* - config_name: ru data_files: - split: train path: ru/train-* - config_name: sw data_files: - split: train path: sw/train-* - config_name: te data_files: - split: train path: te/train-* - config_name: th data_files: - split: train path: th/train-* - config_name: zh data_files: - split: train path: zh/train-* ---
提供机构:
nthakur
原始信息汇总

数据集概述

数据集配置和特征

  1. 配置名称

    • 包含多种语言配置,如ar, bn, en, es, fa, fi, fr, hi, id, ja, ko, ru, sw, te, th, zh
  2. 特征

    • 每个配置包含以下特征:
      • output:包含modeloutput,数据类型为string
      • prompt:数据类型为string
      • query_id:数据类型为string
      • doc_ids:数据类型为sequence: string
      • positive_ids:数据类型为sequence: string
      • negative_ids:数据类型为null

数据集大小和分割

  • 训练集

    • 每个配置的训练集大小和示例数量如下:
      • ar:2735079字节,303个示例。
      • bn:10136389字节,821个示例。
      • en:1520285字节,255个示例。
      • es:1702736字节,325个示例。
      • fa:10060118字节,1397个示例。
      • fi:613801字节,112个示例。
      • fr:5131163字节,1005个示例。
      • hi:19409785字节,1845个示例。
      • id:2916188字节,545个示例。
      • ja:2253271字节,367个示例。
      • ko:8068021字节,1453个示例。
      • ru:3433771字节,361个示例。
      • sw:2770975字节,576个示例。
      • te:1619283字节,167个示例。
      • th:6090937字节,543个示例。
      • zh:5992472字节,1409个示例。
  • 下载大小

    • 每个配置的下载大小如下:
      • ar:1140409字节。
      • bn:3196811字节。
      • en:737245字节。
      • es:800383字节。
      • fa:3672376字节。
      • fi:338235字节。
      • fr:2330304字节。
      • hi:6047001字节。
      • id:1369968字节。
      • ja:1008667字节。
      • ko:3730192字节。
      • ru:1423396字节。
      • sw:1278519字节。
      • te:557845字节。
      • th:1914954字节。
      • zh:2950247字节。
  • 数据集总大小

    • 每个配置的数据集总大小与训练集大小相同。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作