five

Cognitive-Lab/GoogleIndicGenBench_xorqa_in

收藏
Hugging Face2024-06-03 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/Cognitive-Lab/GoogleIndicGenBench_xorqa_in
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: gu features: - name: translated_answers list: - name: answer_start dtype: int64 - name: text dtype: string - name: context dtype: string - name: oracle_question dtype: string - name: split dtype: string - name: title dtype: string - name: lang dtype: string - name: answers list: - name: answer_start dtype: int64 - name: text dtype: string - name: question dtype: string splits: - name: train num_bytes: 95606 num_examples: 100 - name: test num_bytes: 500609 num_examples: 535 - name: dev num_bytes: 463804 num_examples: 499 download_size: 584545 dataset_size: 1060019 - config_name: hi features: - name: translated_answers list: - name: answer_start dtype: int64 - name: text dtype: string - name: context dtype: string - name: oracle_question dtype: string - name: split dtype: string - name: title dtype: string - name: lang dtype: string - name: answers list: - name: answer_start dtype: int64 - name: text dtype: string - name: question dtype: string splits: - name: train num_bytes: 95345 num_examples: 100 - name: test num_bytes: 505163 num_examples: 539 - name: dev num_bytes: 465260 num_examples: 499 download_size: 587339 dataset_size: 1065768 - config_name: kn features: - name: translated_answers list: - name: answer_start dtype: int64 - name: text dtype: string - name: context dtype: string - name: oracle_question dtype: string - name: split dtype: string - name: title dtype: string - name: lang dtype: string - name: answers list: - name: answer_start dtype: int64 - name: text dtype: string - name: question dtype: string splits: - name: train num_bytes: 96439 num_examples: 100 - name: test num_bytes: 506353 num_examples: 536 - name: dev num_bytes: 470438 num_examples: 500 download_size: 591749 dataset_size: 1073230 - config_name: ml features: - name: translated_answers list: - name: answer_start dtype: int64 - name: text dtype: string - name: context dtype: string - name: oracle_question dtype: string - name: split dtype: string - name: title dtype: string - name: lang dtype: string - name: answers list: - name: answer_start dtype: int64 - name: text dtype: string - name: question dtype: string splits: - name: train num_bytes: 97760 num_examples: 100 - name: test num_bytes: 516616 num_examples: 537 - name: dev num_bytes: 478673 num_examples: 500 download_size: 601968 dataset_size: 1093049 - config_name: mr features: - name: translated_answers list: - name: answer_start dtype: int64 - name: text dtype: string - name: context dtype: string - name: oracle_question dtype: string - name: split dtype: string - name: title dtype: string - name: lang dtype: string - name: answers list: - name: answer_start dtype: int64 - name: text dtype: string - name: question dtype: string splits: - name: train num_bytes: 96115 num_examples: 100 - name: test num_bytes: 505346 num_examples: 538 - name: dev num_bytes: 467131 num_examples: 498 download_size: 592818 dataset_size: 1068592 - config_name: ta features: - name: translated_answers list: - name: answer_start dtype: int64 - name: text dtype: string - name: context dtype: string - name: oracle_question dtype: string - name: split dtype: string - name: title dtype: string - name: lang dtype: string - name: answers list: - name: answer_start dtype: int64 - name: text dtype: string - name: question dtype: string splits: - name: train num_bytes: 98324 num_examples: 100 - name: test num_bytes: 520219 num_examples: 538 - name: dev num_bytes: 481205 num_examples: 500 download_size: 594272 dataset_size: 1099748 - config_name: te features: - name: translated_answers list: - name: answer_start dtype: int64 - name: text dtype: string - name: context dtype: string - name: oracle_question dtype: string - name: split dtype: string - name: title dtype: string - name: lang dtype: string - name: answers list: - name: answer_start dtype: int64 - name: text dtype: string - name: question dtype: string splits: - name: train num_bytes: 95599 num_examples: 100 - name: test num_bytes: 499460 num_examples: 539 - name: dev num_bytes: 469737 num_examples: 500 download_size: 585197 dataset_size: 1064796 configs: - config_name: gu data_files: - split: train path: gu/train-* - split: test path: gu/test-* - split: dev path: gu/dev-* - config_name: hi data_files: - split: train path: hi/train-* - split: test path: hi/test-* - split: dev path: hi/dev-* - config_name: kn data_files: - split: train path: kn/train-* - split: test path: kn/test-* - split: dev path: kn/dev-* - config_name: ml data_files: - split: train path: ml/train-* - split: test path: ml/test-* - split: dev path: ml/dev-* - config_name: mr data_files: - split: train path: mr/train-* - split: test path: mr/test-* - split: dev path: mr/dev-* - config_name: ta data_files: - split: train path: ta/train-* - split: test path: ta/test-* - split: dev path: ta/dev-* - config_name: te data_files: - split: train path: te/train-* - split: test path: te/test-* - split: dev path: te/dev-* ---

This dataset includes multiple configurations for different languages, each featuring translated_answers, context, oracle_question, split, title, lang, answers, and question. The dataset is divided into train, test, and dev sets for each language configuration.
提供机构:
Cognitive-Lab
原始信息汇总

数据集概述

数据集配置及特征

  • 配置名称: gu, hi, kn, ml, mr, ta, te
  • 特征:
    • translated_answers:
      • answer_start: int64
      • text: string
    • context: string
    • oracle_question: string
    • split: string
    • title: string
    • lang: string
    • answers:
      • answer_start: int64
      • text: string
    • question: string

数据集分割及大小

  • 分割: train, test, dev
  • 大小:
    • gu:
      • train: 95606 bytes, 100 examples
      • test: 500609 bytes, 535 examples
      • dev: 463804 bytes, 499 examples
      • download_size: 584545 bytes
      • dataset_size: 1060019 bytes
    • hi:
      • train: 95345 bytes, 100 examples
      • test: 505163 bytes, 539 examples
      • dev: 465260 bytes, 499 examples
      • download_size: 587339 bytes
      • dataset_size: 1065768 bytes
    • kn:
      • train: 96439 bytes, 100 examples
      • test: 506353 bytes, 536 examples
      • dev: 470438 bytes, 500 examples
      • download_size: 591749 bytes
      • dataset_size: 1073230 bytes
    • ml:
      • train: 97760 bytes, 100 examples
      • test: 516616 bytes, 537 examples
      • dev: 478673 bytes, 500 examples
      • download_size: 601968 bytes
      • dataset_size: 1093049 bytes
    • mr:
      • train: 96115 bytes, 100 examples
      • test: 505346 bytes, 538 examples
      • dev: 467131 bytes, 498 examples
      • download_size: 592818 bytes
      • dataset_size: 1068592 bytes
    • ta:
      • train: 98324 bytes, 100 examples
      • test: 520219 bytes, 538 examples
      • dev: 481205 bytes, 500 examples
      • download_size: 594272 bytes
      • dataset_size: 1099748 bytes
    • te:
      • train: 95599 bytes, 100 examples
      • test: 499460 bytes, 539 examples
      • dev: 469737 bytes, 500 examples
      • download_size: 585197 bytes
      • dataset_size: 1064796 bytes

数据文件路径

  • gu:
    • train: gu/train-*
    • test: gu/test-*
    • dev: gu/dev-*
  • hi:
    • train: hi/train-*
    • test: hi/test-*
    • dev: hi/dev-*
  • kn:
    • train: kn/train-*
    • test: kn/test-*
    • dev: kn/dev-*
  • ml:
    • train: ml/train-*
    • test: ml/test-*
    • dev: ml/dev-*
  • mr:
    • train: mr/train-*
    • test: mr/test-*
    • dev: mr/dev-*
  • ta:
    • train: ta/train-*
    • test: ta/test-*
    • dev: ta/dev-*
  • te:
    • train: te/train-*
    • test: te/test-*
    • dev: te/dev-*
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作