five

anhnv125/ud_alpaca2

收藏
Hugging Face2024-03-13 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/anhnv125/ud_alpaca2
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: be_hse features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 70717334 num_examples: 21555 - name: validation num_bytes: 3567300 num_examples: 1090 - name: test num_bytes: 3084569 num_examples: 889 download_size: 7133074 dataset_size: 77369203 - config_name: bxr_bdt features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 58584 num_examples: 19 - name: test num_bytes: 2992354 num_examples: 908 download_size: 292544 dataset_size: 3050938 - config_name: cs_pdt features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 229105538 num_examples: 68495 - name: validation num_bytes: 31026344 num_examples: 9270 - name: test num_bytes: 33925044 num_examples: 10148 download_size: 33642578 dataset_size: 294056926 - config_name: de_gsd features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 47097453 num_examples: 13814 - name: validation num_bytes: 2610159 num_examples: 799 - name: test num_bytes: 3246657 num_examples: 977 download_size: 6561391 dataset_size: 52954269 - config_name: en_ewt features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 40772463 num_examples: 12543 - name: validation num_bytes: 6256186 num_examples: 2002 - name: test num_bytes: 6455849 num_examples: 2077 download_size: 5048512 dataset_size: 53484498 - config_name: es_ancora features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 54070781 num_examples: 14305 - name: validation num_bytes: 6283057 num_examples: 1654 - name: test num_bytes: 6474168 num_examples: 1721 download_size: 10844605 dataset_size: 66828006 - config_name: fr_gsd features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 51554567 num_examples: 14449 - name: validation num_bytes: 5249987 num_examples: 1476 - name: test num_bytes: 1473053 num_examples: 416 download_size: 8413666 dataset_size: 58277607 - config_name: hsb_ufal features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 79287 num_examples: 23 - name: test num_bytes: 2077117 num_examples: 623 download_size: 278220 dataset_size: 2156404 - config_name: kk_ktb features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 110658 num_examples: 31 - name: test num_bytes: 3344564 num_examples: 1047 download_size: 323611 dataset_size: 3455222 - config_name: lt_hse features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 535083 num_examples: 153 - name: validation num_bytes: 535083 num_examples: 153 - name: test num_bytes: 535083 num_examples: 153 download_size: 284568 dataset_size: 1605249 - config_name: ru_syntagrus features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 176413407 num_examples: 48814 - name: validation num_bytes: 23756503 num_examples: 6584 - name: test num_bytes: 23455102 num_examples: 6491 download_size: 28519423 dataset_size: 223625012 configs: - config_name: be_hse data_files: - split: train path: be_hse/train-* - split: validation path: be_hse/validation-* - split: test path: be_hse/test-* - config_name: bxr_bdt data_files: - split: train path: bxr_bdt/train-* - split: test path: bxr_bdt/test-* - config_name: cs_pdt data_files: - split: train path: cs_pdt/train-* - split: validation path: cs_pdt/validation-* - split: test path: cs_pdt/test-* - config_name: de_gsd data_files: - split: train path: de_gsd/train-* - split: validation path: de_gsd/validation-* - split: test path: de_gsd/test-* - config_name: en_ewt data_files: - split: train path: en_ewt/train-* - split: validation path: en_ewt/validation-* - split: test path: en_ewt/test-* - config_name: es_ancora data_files: - split: train path: es_ancora/train-* - split: validation path: es_ancora/validation-* - split: test path: es_ancora/test-* - config_name: fr_gsd data_files: - split: train path: fr_gsd/train-* - split: validation path: fr_gsd/validation-* - split: test path: fr_gsd/test-* - config_name: hsb_ufal data_files: - split: train path: hsb_ufal/train-* - split: test path: hsb_ufal/test-* - config_name: kk_ktb data_files: - split: train path: kk_ktb/train-* - split: test path: kk_ktb/test-* - config_name: lt_hse data_files: - split: train path: lt_hse/train-* - split: validation path: lt_hse/validation-* - split: test path: lt_hse/test-* - config_name: ru_syntagrus data_files: - split: train path: ru_syntagrus/train-* - split: validation path: ru_syntagrus/validation-* - split: test path: ru_syntagrus/test-* ---
提供机构:
anhnv125
原始信息汇总

数据集概述

数据集配置信息

配置名称 特征 训练集 验证集 测试集 下载大小 数据集大小
be_hse instruction, input, output 70717334 B, 21555 examples 3567300 B, 1090 examples 3084569 B, 889 examples 7133074 B 77369203 B
bxr_bdt instruction, input, output 58584 B, 19 examples - 2992354 B, 908 examples 292544 B 3050938 B
cs_pdt instruction, input, output 229105538 B, 68495 examples 31026344 B, 9270 examples 33925044 B, 10148 examples 33642578 B 294056926 B
de_gsd instruction, input, output 47097453 B, 13814 examples 2610159 B, 799 examples 3246657 B, 977 examples 6561391 B 52954269 B
en_ewt instruction, input, output 40772463 B, 12543 examples 6256186 B, 2002 examples 6455849 B, 2077 examples 5048512 B 53484498 B
es_ancora instruction, input, output 54070781 B, 14305 examples 6283057 B, 1654 examples 6474168 B, 1721 examples 10844605 B 66828006 B
fr_gsd instruction, input, output 51554567 B, 14449 examples 5249987 B, 1476 examples 1473053 B, 416 examples 8413666 B 58277607 B
hsb_ufal instruction, input, output 79287 B, 23 examples - 2077117 B, 623 examples 278220 B 2156404 B
kk_ktb instruction, input, output 110658 B, 31 examples - 3344564 B, 1047 examples 323611 B 3455222 B
lt_hse instruction, input, output 535083 B, 153 examples 535083 B, 153 examples 535083 B, 153 examples 284568 B 1605249 B
ru_syntagrus instruction, input, output 176413407 B, 48814 examples 23756503 B, 6584 examples 23455102 B, 6491 examples 28519423 B 223625012 B

数据集文件路径

配置名称 训练集路径 验证集路径 测试集路径
be_hse be_hse/train-* be_hse/validation-* be_hse/test-*
bxr_bdt bxr_bdt/train-* - bxr_bdt/test-*
cs_pdt cs_pdt/train-* cs_pdt/validation-* cs_pdt/test-*
de_gsd de_gsd/train-* de_gsd/validation-* de_gsd/test-*
en_ewt en_ewt/train-* en_ewt/validation-* en_ewt/test-*
es_ancora es_ancora/train-* es_ancora/validation-* es_ancora/test-*
fr_gsd fr_gsd/train-* fr_gsd/validation-* fr_gsd/test-*
hsb_ufal hsb_ufal/train-* - hsb_ufal/test-*
kk_ktb kk_ktb/train-* - kk_ktb/test-*
lt_hse lt_hse/train-* lt_hse/validation-* lt_hse/test-*
ru_syntagrus ru_syntagrus/train-* ru_syntagrus/validation-* ru_syntagrus/test-*
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作