Ali-C137/Arabic-AYA
收藏Hugging Face2024-03-14 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/Ali-C137/Arabic-AYA
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: CohereForAI-aya_collection-translated_cnn_dailymail
features:
- name: id
dtype: int64
- name: inputs
dtype: string
- name: targets
dtype: string
- name: dataset_name
dtype: string
- name: sub_dataset_name
dtype: string
- name: task_type
dtype: string
- name: template_id
dtype: int64
- name: language
dtype: string
- name: script
dtype: string
- name: split
dtype: string
splits:
- name: train
num_bytes: 3578924407
num_examples: 1000000
- name: test
num_bytes: 415594340
num_examples: 114900
- name: validation
num_bytes: 486698663
num_examples: 133680
download_size: 2209523190
dataset_size: 4481217410
- config_name: CohereForAI-aya_collection-translated_soda
features:
- name: id
dtype: int64
- name: inputs
dtype: string
- name: targets
dtype: string
- name: dataset_name
dtype: string
- name: sub_dataset_name
dtype: string
- name: task_type
dtype: string
- name: template_id
dtype: int64
- name: language
dtype: string
- name: script
dtype: string
- name: split
dtype: string
splits:
- name: train
num_bytes: 6230916321
num_examples: 11915820
- name: test
num_bytes: 777982873
num_examples: 1489680
- name: validation
num_bytes: 772817056
num_examples: 1463460
download_size: 2804874077
dataset_size: 7781716250
- config_name: CohereForAI-aya_collection-translated_wiki_split
features:
- name: id
dtype: int64
- name: inputs
dtype: string
- name: targets
dtype: string
- name: dataset_name
dtype: string
- name: sub_dataset_name
dtype: string
- name: task_type
dtype: string
- name: template_id
dtype: int64
- name: language
dtype: string
- name: script
dtype: string
- name: split
dtype: string
splits:
- name: train
num_bytes: 6349516377
num_examples: 9899440
- name: test
num_bytes: 32058254
num_examples: 50000
- name: validation
num_bytes: 32284536
num_examples: 50000
download_size: 2446037624
dataset_size: 6413859167
configs:
- config_name: CohereForAI-aya_collection-translated_cnn_dailymail
data_files:
- split: train
path: CohereForAI-aya_collection-translated_cnn_dailymail/train-*
- split: test
path: CohereForAI-aya_collection-translated_cnn_dailymail/test-*
- split: validation
path: CohereForAI-aya_collection-translated_cnn_dailymail/validation-*
- config_name: CohereForAI-aya_collection-translated_soda
data_files:
- split: train
path: CohereForAI-aya_collection-translated_soda/train-*
- split: test
path: CohereForAI-aya_collection-translated_soda/test-*
- split: validation
path: CohereForAI-aya_collection-translated_soda/validation-*
- config_name: CohereForAI-aya_collection-translated_wiki_split
data_files:
- split: train
path: CohereForAI-aya_collection-translated_wiki_split/train-*
- split: test
path: CohereForAI-aya_collection-translated_wiki_split/test-*
- split: validation
path: CohereForAI-aya_collection-translated_wiki_split/validation-*
---
提供机构:
Ali-C137
原始信息汇总
数据集概述
数据集1: CohereForAI-aya_collection-translated_cnn_dailymail
-
特征:
- id: int64
- inputs: string
- targets: string
- dataset_name: string
- sub_dataset_name: string
- task_type: string
- template_id: int64
- language: string
- script: string
- split: string
-
分割:
- train: 1000000 examples, 3578924407 bytes
- test: 114900 examples, 415594340 bytes
- validation: 133680 examples, 486698663 bytes
-
下载大小: 2209523190 bytes
-
数据集大小: 4481217410 bytes
数据集2: CohereForAI-aya_collection-translated_soda
-
特征:
- id: int64
- inputs: string
- targets: string
- dataset_name: string
- sub_dataset_name: string
- task_type: string
- template_id: int64
- language: string
- script: string
- split: string
-
分割:
- train: 11915820 examples, 6230916321 bytes
- test: 1489680 examples, 777982873 bytes
- validation: 1463460 examples, 772817056 bytes
-
下载大小: 2804874077 bytes
-
数据集大小: 7781716250 bytes
数据集3: CohereForAI-aya_collection-translated_wiki_split
-
特征:
- id: int64
- inputs: string
- targets: string
- dataset_name: string
- sub_dataset_name: string
- task_type: string
- template_id: int64
- language: string
- script: string
- split: string
-
分割:
- train: 9899440 examples, 6349516377 bytes
- test: 50000 examples, 32058254 bytes
- validation: 50000 examples, 32284536 bytes
-
下载大小: 2446037624 bytes
-
数据集大小: 6413859167 bytes



