anhnv125/ud_alpaca2
收藏Hugging Face2024-03-13 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/anhnv125/ud_alpaca2
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: be_hse
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 70717334
num_examples: 21555
- name: validation
num_bytes: 3567300
num_examples: 1090
- name: test
num_bytes: 3084569
num_examples: 889
download_size: 7133074
dataset_size: 77369203
- config_name: bxr_bdt
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 58584
num_examples: 19
- name: test
num_bytes: 2992354
num_examples: 908
download_size: 292544
dataset_size: 3050938
- config_name: cs_pdt
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 229105538
num_examples: 68495
- name: validation
num_bytes: 31026344
num_examples: 9270
- name: test
num_bytes: 33925044
num_examples: 10148
download_size: 33642578
dataset_size: 294056926
- config_name: de_gsd
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 47097453
num_examples: 13814
- name: validation
num_bytes: 2610159
num_examples: 799
- name: test
num_bytes: 3246657
num_examples: 977
download_size: 6561391
dataset_size: 52954269
- config_name: en_ewt
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 40772463
num_examples: 12543
- name: validation
num_bytes: 6256186
num_examples: 2002
- name: test
num_bytes: 6455849
num_examples: 2077
download_size: 5048512
dataset_size: 53484498
- config_name: es_ancora
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 54070781
num_examples: 14305
- name: validation
num_bytes: 6283057
num_examples: 1654
- name: test
num_bytes: 6474168
num_examples: 1721
download_size: 10844605
dataset_size: 66828006
- config_name: fr_gsd
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 51554567
num_examples: 14449
- name: validation
num_bytes: 5249987
num_examples: 1476
- name: test
num_bytes: 1473053
num_examples: 416
download_size: 8413666
dataset_size: 58277607
- config_name: hsb_ufal
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 79287
num_examples: 23
- name: test
num_bytes: 2077117
num_examples: 623
download_size: 278220
dataset_size: 2156404
- config_name: kk_ktb
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 110658
num_examples: 31
- name: test
num_bytes: 3344564
num_examples: 1047
download_size: 323611
dataset_size: 3455222
- config_name: lt_hse
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 535083
num_examples: 153
- name: validation
num_bytes: 535083
num_examples: 153
- name: test
num_bytes: 535083
num_examples: 153
download_size: 284568
dataset_size: 1605249
- config_name: ru_syntagrus
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 176413407
num_examples: 48814
- name: validation
num_bytes: 23756503
num_examples: 6584
- name: test
num_bytes: 23455102
num_examples: 6491
download_size: 28519423
dataset_size: 223625012
configs:
- config_name: be_hse
data_files:
- split: train
path: be_hse/train-*
- split: validation
path: be_hse/validation-*
- split: test
path: be_hse/test-*
- config_name: bxr_bdt
data_files:
- split: train
path: bxr_bdt/train-*
- split: test
path: bxr_bdt/test-*
- config_name: cs_pdt
data_files:
- split: train
path: cs_pdt/train-*
- split: validation
path: cs_pdt/validation-*
- split: test
path: cs_pdt/test-*
- config_name: de_gsd
data_files:
- split: train
path: de_gsd/train-*
- split: validation
path: de_gsd/validation-*
- split: test
path: de_gsd/test-*
- config_name: en_ewt
data_files:
- split: train
path: en_ewt/train-*
- split: validation
path: en_ewt/validation-*
- split: test
path: en_ewt/test-*
- config_name: es_ancora
data_files:
- split: train
path: es_ancora/train-*
- split: validation
path: es_ancora/validation-*
- split: test
path: es_ancora/test-*
- config_name: fr_gsd
data_files:
- split: train
path: fr_gsd/train-*
- split: validation
path: fr_gsd/validation-*
- split: test
path: fr_gsd/test-*
- config_name: hsb_ufal
data_files:
- split: train
path: hsb_ufal/train-*
- split: test
path: hsb_ufal/test-*
- config_name: kk_ktb
data_files:
- split: train
path: kk_ktb/train-*
- split: test
path: kk_ktb/test-*
- config_name: lt_hse
data_files:
- split: train
path: lt_hse/train-*
- split: validation
path: lt_hse/validation-*
- split: test
path: lt_hse/test-*
- config_name: ru_syntagrus
data_files:
- split: train
path: ru_syntagrus/train-*
- split: validation
path: ru_syntagrus/validation-*
- split: test
path: ru_syntagrus/test-*
---
提供机构:
anhnv125
原始信息汇总
数据集概述
数据集配置信息
| 配置名称 | 特征 | 训练集 | 验证集 | 测试集 | 下载大小 | 数据集大小 |
|---|---|---|---|---|---|---|
| be_hse | instruction, input, output | 70717334 B, 21555 examples | 3567300 B, 1090 examples | 3084569 B, 889 examples | 7133074 B | 77369203 B |
| bxr_bdt | instruction, input, output | 58584 B, 19 examples | - | 2992354 B, 908 examples | 292544 B | 3050938 B |
| cs_pdt | instruction, input, output | 229105538 B, 68495 examples | 31026344 B, 9270 examples | 33925044 B, 10148 examples | 33642578 B | 294056926 B |
| de_gsd | instruction, input, output | 47097453 B, 13814 examples | 2610159 B, 799 examples | 3246657 B, 977 examples | 6561391 B | 52954269 B |
| en_ewt | instruction, input, output | 40772463 B, 12543 examples | 6256186 B, 2002 examples | 6455849 B, 2077 examples | 5048512 B | 53484498 B |
| es_ancora | instruction, input, output | 54070781 B, 14305 examples | 6283057 B, 1654 examples | 6474168 B, 1721 examples | 10844605 B | 66828006 B |
| fr_gsd | instruction, input, output | 51554567 B, 14449 examples | 5249987 B, 1476 examples | 1473053 B, 416 examples | 8413666 B | 58277607 B |
| hsb_ufal | instruction, input, output | 79287 B, 23 examples | - | 2077117 B, 623 examples | 278220 B | 2156404 B |
| kk_ktb | instruction, input, output | 110658 B, 31 examples | - | 3344564 B, 1047 examples | 323611 B | 3455222 B |
| lt_hse | instruction, input, output | 535083 B, 153 examples | 535083 B, 153 examples | 535083 B, 153 examples | 284568 B | 1605249 B |
| ru_syntagrus | instruction, input, output | 176413407 B, 48814 examples | 23756503 B, 6584 examples | 23455102 B, 6491 examples | 28519423 B | 223625012 B |
数据集文件路径
| 配置名称 | 训练集路径 | 验证集路径 | 测试集路径 |
|---|---|---|---|
| be_hse | be_hse/train-* | be_hse/validation-* | be_hse/test-* |
| bxr_bdt | bxr_bdt/train-* | - | bxr_bdt/test-* |
| cs_pdt | cs_pdt/train-* | cs_pdt/validation-* | cs_pdt/test-* |
| de_gsd | de_gsd/train-* | de_gsd/validation-* | de_gsd/test-* |
| en_ewt | en_ewt/train-* | en_ewt/validation-* | en_ewt/test-* |
| es_ancora | es_ancora/train-* | es_ancora/validation-* | es_ancora/test-* |
| fr_gsd | fr_gsd/train-* | fr_gsd/validation-* | fr_gsd/test-* |
| hsb_ufal | hsb_ufal/train-* | - | hsb_ufal/test-* |
| kk_ktb | kk_ktb/train-* | - | kk_ktb/test-* |
| lt_hse | lt_hse/train-* | lt_hse/validation-* | lt_hse/test-* |
| ru_syntagrus | ru_syntagrus/train-* | ru_syntagrus/validation-* | ru_syntagrus/test-* |



