davidguzmanr/AfriInstruct-language-split
收藏Hugging Face2024-07-26 更新2024-07-22 收录
下载链接:
https://hf-mirror.com/datasets/davidguzmanr/AfriInstruct-language-split
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含多种语言和语言对的配置,涉及英语、法语、阿拉伯语等多种语言及其翻译任务。每个配置包含指令、输出、语言、分割、来源和任务等特征。数据集分为训练集、验证集和测试集,每个分割的字节大小和样本数量均有详细记录。数据集的下载大小和总大小也提供了具体数值。
This dataset contains configurations for multiple languages and language pairs, involving languages such as English, French, Arabic, and their translation tasks. Each configuration includes features such as instruction, output, language, split, source, and task. The dataset is divided into training, validation, and test sets, with detailed records of the byte size and number of samples for each split. The download size and total size of the dataset are also provided.
提供机构:
davidguzmanr
原始信息汇总
数据集概述
该数据集包含多个语言配置,每个配置包含不同的语言对和数据分割。以下是数据集的详细信息:
数据集配置
配置名称:acq
- 特征:
- instruction: string
- output: string
- lang: string
- split: string
- source: string
- task: string
- 分割:
- train:
- num_bytes: 27245229
- num_examples: 41210
- train:
- 下载大小:11697810
- 数据集大小:27245229
配置名称:aeb
- 特征:
- instruction: string
- output: string
- lang: string
- split: string
- source: string
- task: string
- 分割:
- train:
- num_bytes: 27636764
- num_examples: 41210
- train:
- 下载大小:12154368
- 数据集大小:27636764
配置名称:afr
- 特征:
- instruction: string
- output: string
- lang: string
- split: string
- source: string
- task: string
- 分割:
- train:
- num_bytes: 20839897
- num_examples: 41280
- train:
- 下载大小:10214951
- 数据集大小:20839897
配置名称:afr-eng
- 特征:
- instruction: string
- output: string
- lang: string
- split: string
- source: string
- task: string
- 分割:
- train:
- num_bytes: 471538.19991478464
- num_examples: 991
- train:
- 下载大小:195383
- 数据集大小:471538.19991478464
配置名称:afr-fra
- 特征:
- instruction: string
- output: string
- lang: string
- split: string
- source: string
- task: string
- 分割:
- train:
- num_bytes: 468683.2764036961
- num_examples: 985
- train:
- 下载大小:213442
- 数据集大小:468683.2764036961
配置名称:amh
- 特征:
- instruction: string
- output: string
- lang: string
- split: string
- source: string
- task: string
- 分割:
- train:
- num_bytes: 39950216.69315665
- num_examples: 83716
- validation:
- num_bytes: 7236496.779463338
- num_examples: 8425
- test:
- num_bytes: 10208636.842918986
- num_examples: 11875
- train:
- 下载大小:33162595
- 数据集大小:57395350.31553897
配置名称:amh-eng
- 特征:
- instruction: string
- output: string
- lang: string
- split: string
- source: string
- task: string
- 分割:
- train:
- num_bytes: 708972.6719203119
- num_examples: 1490
- validation:
- num_bytes: 378788.73350069224
- num_examples: 441
- train:
- 下载大小:421358
- 数据集大小:1087761.4054210042
配置名称:amh-fra
- 特征:
- instruction: string
- output: string
- lang: string
- split: string
- source: string
- task: string
- 分割:
- train:
- num_bytes: 482482.0733739572
- num_examples: 1014
- train:
- 下载大小:251706
- 数据集大小:482482.0733739572
配置名称:ara
- 特征:
- instruction: string
- output: string
- lang: string
- split: string
- source: string
- task: string
- 分割:
- train:
- num_bytes: 150787543.44399115
- num_examples: 316900
- validation:
- num_bytes: 32413493.05241638
- num_examples: 37737
- test:
- num_bytes: 73381830.81360544
- num_examples: 85360
- train:
- 下载大小:354933584
- 数据集大小:256582867.310013
配置名称:ara-eng
- 特征:
- instruction: string
- output: string
- lang: string
- split: string
- source: string
- task: string
- 分割:
- train:
- num_bytes: 464400.8911370633
- num_examples: 976
- train:
- 下载大小:211978
- 数据集大小:464400.8911370633
配置名称:ara-fra
- 特征:
- instruction: string
- output: string
- lang: string
- split: string
- source: string
- task: string
- 分割:
- train:
- num_bytes: 484385.3557146829
- num_examples: 1018
- train:
- 下载大小:239832
- 数据集大小:484385.3557146829
配置名称:arb
- 特征:
- instruction: string
- output: string
- lang: string
- split: string
- source: string
- task: string
- 分割:
- train:
- num_bytes: 48355312
- num_examples: 66480
- train:
- 下载大小:15661742
- 数据集大小:48355312
配置名称:arq
- 特征:
- instruction: string
- output: string
- lang: string
- split: string
- source: string
- task: string
- 分割:
- train:
- num_bytes: 21451
- num_examples: 70
- train:
- 下载大小:10063
- 数据集大小:21451
配置名称:ars
- 特征:
- instruction: string
- output: string
- lang: string
- split: string
- source: string
- task: string
- 分割:
- train:
- num_bytes: 27264801
- num_examples: 41210
- train:
- 下载大小:11640360
- 数据集大小:27264801
配置名称:ary
- 特征:
- instruction: string
- output: string
- lang: string
- split: string
- source: string
- task: string
- 分割:
- train:
- num_bytes: 28812801
- num_examples: 41470
- train:
- 下载大小:12294384
- 数据集大小:28812801
配置名称:arz
- 特征:
- instruction: string
- output: string
- lang: string
- split: string
- source: string
- task: string
- 分割:
- train:
- num_bytes: 27709422
- num_examples: 41210
- train:
- 下载大小:11885887
- 数据集大小:27709422
配置名称:bem
- 特征:
- instruction: string
- output: string
- lang: string
- split: string
- source: string
- task: string
- 分割:
- train:
- num_bytes: 4097
- num_examples: 30
- train:
- 下载大小:5185
- 数据集大小:4097
配置名称:eng
- 特征:
- instruction: string
- output: string
- lang: string
- split: string
- source: string
- task: string
- 分割:
- train:
- num_bytes: 62432894.802238956
- num_examples: 131211
- validation:
- num_bytes: 32831792.628867257
- num_examples: 38224
- test:
- num_bytes: 122623996.57043909
- num_examples: 142640
- train:
- 下载大小:198049322
- 数据集大小:217888684.0015453
配置名称:eng-afr
- 特征:
- instruction: string
- output: string
- lang: string
- split: string
- source: string
- task: string
- 分割:
- train:
- num_bytes: 484385.3557146829
- num_examples: 1018
- train:
- 下载大小:208740
- 数据集大小:484385.3557146829
配置名称:eng-amh
- 特征:
- instruction: string
- output: string
- lang: string
- split: string
- source: string
- task: string
- 分割:
- train:
- num_bytes: 740376.8305422855
- num_examples: 1556
- validation:
- num_bytes: 393390.56676489126
- num_examples: 458
- train:
- 下载大小:433949
- 数据集大小:1133767.3973071766
配置名称:eng-ara
- 特征:
- instruction: string
- output: string
- lang: string
- split: string
- source: string
- task: string
- 分割:
- train:
- num_bytes: 491522.66449240415
- num_examples: 1033
- train:
- 下载大小:226322
- 数据集大小:491522.66449240415
配置名称:eng-eng
- 特征:
- instruction: string
- output: string
- lang: string
- split: string
- source: string
- task: string
- 分割:
- train:
- num_bytes: 955923.5556294675
- num_examples: 2009
- train:
- 下载大小:382003
- 数据集大小:955923.5556294675
配置名称:eng-fra
- 特征:
- instruction: string
- output: string
- lang: string
- split: string
- source: string
- task: string
- 分割:
- train:
- num_bytes: 969246.5320145471
- num_examples: 2037
- train:
- 下载大小:428593
- 数据集大小:969246.5320145471
配置名称:eng-hau
- 特征:
- instruction: string
- output: string
- lang: string
- split: string
- source: string
- task: string
- 分割:
- train:
- num_bytes: 2251107.188493286
- num_examples: 4731
- validation:
- num_bytes: 535973.1739329523
- num_examples: 624
- train:
- 下载大小:1031610
- 数据集大小:2787080.362426238
配置名称:eng-ibo
- 特征:
- instruction: string
- output: string
- lang: string
- split: string
- source: string
- task: string
- 分割:
- train:
- num_bytes: 2497582.25161726
- num_examples: 5249
- validation:
- num_bytes: 648493.183204133
- num_examples: 755
- train:
- 下载大小:922438
- 数据集大小:3146075.434821393
配置名称:eng-kin
- 特征:
- instruction: string
- output: string
- lang: string
- split: string
- source: string
- task: string
- 分割:
- train:
- num_bytes: 717061.621868396
- num_examples: 1507
- validation:
- num_bytes: 200131.0088563748
- num_examples: 233
- train:
- 下载大小:398394
- 数据集大小:917192.6307247707
配置名称:eng-nya
- 特征:
- instruction: string
- output: string
- lang: string
- split: string
- source: string
- task: string
- 分割:
- train:
- num_bytes: 732763.7011793827
- num_examples: 1540
- validation:
- num_bytes: 2198
- train:



