five

Ali-C137/Arabic-AYA

收藏
Hugging Face2024-03-14 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/Ali-C137/Arabic-AYA
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: CohereForAI-aya_collection-translated_cnn_dailymail features: - name: id dtype: int64 - name: inputs dtype: string - name: targets dtype: string - name: dataset_name dtype: string - name: sub_dataset_name dtype: string - name: task_type dtype: string - name: template_id dtype: int64 - name: language dtype: string - name: script dtype: string - name: split dtype: string splits: - name: train num_bytes: 3578924407 num_examples: 1000000 - name: test num_bytes: 415594340 num_examples: 114900 - name: validation num_bytes: 486698663 num_examples: 133680 download_size: 2209523190 dataset_size: 4481217410 - config_name: CohereForAI-aya_collection-translated_soda features: - name: id dtype: int64 - name: inputs dtype: string - name: targets dtype: string - name: dataset_name dtype: string - name: sub_dataset_name dtype: string - name: task_type dtype: string - name: template_id dtype: int64 - name: language dtype: string - name: script dtype: string - name: split dtype: string splits: - name: train num_bytes: 6230916321 num_examples: 11915820 - name: test num_bytes: 777982873 num_examples: 1489680 - name: validation num_bytes: 772817056 num_examples: 1463460 download_size: 2804874077 dataset_size: 7781716250 - config_name: CohereForAI-aya_collection-translated_wiki_split features: - name: id dtype: int64 - name: inputs dtype: string - name: targets dtype: string - name: dataset_name dtype: string - name: sub_dataset_name dtype: string - name: task_type dtype: string - name: template_id dtype: int64 - name: language dtype: string - name: script dtype: string - name: split dtype: string splits: - name: train num_bytes: 6349516377 num_examples: 9899440 - name: test num_bytes: 32058254 num_examples: 50000 - name: validation num_bytes: 32284536 num_examples: 50000 download_size: 2446037624 dataset_size: 6413859167 configs: - config_name: CohereForAI-aya_collection-translated_cnn_dailymail data_files: - split: train path: CohereForAI-aya_collection-translated_cnn_dailymail/train-* - split: test path: CohereForAI-aya_collection-translated_cnn_dailymail/test-* - split: validation path: CohereForAI-aya_collection-translated_cnn_dailymail/validation-* - config_name: CohereForAI-aya_collection-translated_soda data_files: - split: train path: CohereForAI-aya_collection-translated_soda/train-* - split: test path: CohereForAI-aya_collection-translated_soda/test-* - split: validation path: CohereForAI-aya_collection-translated_soda/validation-* - config_name: CohereForAI-aya_collection-translated_wiki_split data_files: - split: train path: CohereForAI-aya_collection-translated_wiki_split/train-* - split: test path: CohereForAI-aya_collection-translated_wiki_split/test-* - split: validation path: CohereForAI-aya_collection-translated_wiki_split/validation-* ---
提供机构:
Ali-C137
原始信息汇总

数据集概述

数据集1: CohereForAI-aya_collection-translated_cnn_dailymail

  • 特征:

    • id: int64
    • inputs: string
    • targets: string
    • dataset_name: string
    • sub_dataset_name: string
    • task_type: string
    • template_id: int64
    • language: string
    • script: string
    • split: string
  • 分割:

    • train: 1000000 examples, 3578924407 bytes
    • test: 114900 examples, 415594340 bytes
    • validation: 133680 examples, 486698663 bytes
  • 下载大小: 2209523190 bytes

  • 数据集大小: 4481217410 bytes

数据集2: CohereForAI-aya_collection-translated_soda

  • 特征:

    • id: int64
    • inputs: string
    • targets: string
    • dataset_name: string
    • sub_dataset_name: string
    • task_type: string
    • template_id: int64
    • language: string
    • script: string
    • split: string
  • 分割:

    • train: 11915820 examples, 6230916321 bytes
    • test: 1489680 examples, 777982873 bytes
    • validation: 1463460 examples, 772817056 bytes
  • 下载大小: 2804874077 bytes

  • 数据集大小: 7781716250 bytes

数据集3: CohereForAI-aya_collection-translated_wiki_split

  • 特征:

    • id: int64
    • inputs: string
    • targets: string
    • dataset_name: string
    • sub_dataset_name: string
    • task_type: string
    • template_id: int64
    • language: string
    • script: string
    • split: string
  • 分割:

    • train: 9899440 examples, 6349516377 bytes
    • test: 50000 examples, 32058254 bytes
    • validation: 50000 examples, 32284536 bytes
  • 下载大小: 2446037624 bytes

  • 数据集大小: 6413859167 bytes

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作