anhnv125/ud_alpaca
收藏Hugging Face2024-03-05 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/anhnv125/ud_alpaca
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: be_hse
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 66768815
num_examples: 21555
- name: validation
num_bytes: 3370351
num_examples: 1090
- name: test
num_bytes: 2873580
num_examples: 889
download_size: 5480853
dataset_size: 73012746
- config_name: bxr_bdt
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 56167
num_examples: 19
- name: test
num_bytes: 2821495
num_examples: 908
download_size: 228304
dataset_size: 2877662
- config_name: cs_pdt
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 216399245
num_examples: 68495
- name: validation
num_bytes: 29301204
num_examples: 9270
- name: test
num_bytes: 32048085
num_examples: 10148
download_size: 25707376
dataset_size: 277748534
- config_name: de_gsd
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 44307017
num_examples: 13814
- name: validation
num_bytes: 2477610
num_examples: 799
- name: test
num_bytes: 3070360
num_examples: 977
download_size: 4999156
dataset_size: 49854987
- config_name: en_ewt
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 38805886
num_examples: 12543
- name: validation
num_bytes: 6000641
num_examples: 2002
- name: test
num_bytes: 6198885
num_examples: 2077
download_size: 3810046
dataset_size: 51005412
- config_name: es_ancora
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 49943950
num_examples: 14305
- name: validation
num_bytes: 5798461
num_examples: 1654
- name: test
num_bytes: 5985191
num_examples: 1721
download_size: 8063762
dataset_size: 61727602
- config_name: fr_gsd
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 48157929
num_examples: 14449
- name: validation
num_bytes: 4906593
num_examples: 1476
- name: test
num_bytes: 1378398
num_examples: 416
download_size: 6341149
dataset_size: 54442920
- config_name: hsb_ufal
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 74433
num_examples: 23
- name: test
num_bytes: 1963315
num_examples: 623
download_size: 218777
dataset_size: 2037748
- config_name: kk_ktb
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 102630
num_examples: 31
- name: test
num_bytes: 3176663
num_examples: 1047
download_size: 257360
dataset_size: 3279293
- config_name: lt_hse
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 501163
num_examples: 153
- name: validation
num_bytes: 501163
num_examples: 153
- name: test
num_bytes: 501163
num_examples: 153
download_size: 229455
dataset_size: 1503489
- config_name: ru_syntagrus
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
splits:
- name: train
num_bytes: 163096580
num_examples: 48814
- name: validation
num_bytes: 21977495
num_examples: 6584
- name: test
num_bytes: 21691135
num_examples: 6491
download_size: 21623891
dataset_size: 206765210
configs:
- config_name: be_hse
data_files:
- split: train
path: be_hse/train-*
- split: validation
path: be_hse/validation-*
- split: test
path: be_hse/test-*
- config_name: bxr_bdt
data_files:
- split: train
path: bxr_bdt/train-*
- split: test
path: bxr_bdt/test-*
- config_name: cs_pdt
data_files:
- split: train
path: cs_pdt/train-*
- split: validation
path: cs_pdt/validation-*
- split: test
path: cs_pdt/test-*
- config_name: de_gsd
data_files:
- split: train
path: de_gsd/train-*
- split: validation
path: de_gsd/validation-*
- split: test
path: de_gsd/test-*
- config_name: en_ewt
data_files:
- split: train
path: en_ewt/train-*
- split: validation
path: en_ewt/validation-*
- split: test
path: en_ewt/test-*
- config_name: es_ancora
data_files:
- split: train
path: es_ancora/train-*
- split: validation
path: es_ancora/validation-*
- split: test
path: es_ancora/test-*
- config_name: fr_gsd
data_files:
- split: train
path: fr_gsd/train-*
- split: validation
path: fr_gsd/validation-*
- split: test
path: fr_gsd/test-*
- config_name: hsb_ufal
data_files:
- split: train
path: hsb_ufal/train-*
- split: test
path: hsb_ufal/test-*
- config_name: kk_ktb
data_files:
- split: train
path: kk_ktb/train-*
- split: test
path: kk_ktb/test-*
- config_name: lt_hse
data_files:
- split: train
path: lt_hse/train-*
- split: validation
path: lt_hse/validation-*
- split: test
path: lt_hse/test-*
- config_name: ru_syntagrus
data_files:
- split: train
path: ru_syntagrus/train-*
- split: validation
path: ru_syntagrus/validation-*
- split: test
path: ru_syntagrus/test-*
---
提供机构:
anhnv125
原始信息汇总
数据集概述
数据集配置
be_hse
- 特征:
instruction: stringinput: stringoutput: string
- 分割:
train: 66768815 字节, 21555 样本validation: 3370351 字节, 1090 样本test: 2873580 字节, 889 样本
- 下载大小: 5480853 字节
- 数据集大小: 73012746 字节
bxr_bdt
- 特征:
instruction: stringinput: stringoutput: string
- 分割:
train: 56167 字节, 19 样本test: 2821495 字节, 908 样本
- 下载大小: 228304 字节
- 数据集大小: 2877662 字节
cs_pdt
- 特征:
instruction: stringinput: stringoutput: string
- 分割:
train: 216399245 字节, 68495 样本validation: 29301204 字节, 9270 样本test: 32048085 字节, 10148 样本
- 下载大小: 25707376 字节
- 数据集大小: 277748534 字节
de_gsd
- 特征:
instruction: stringinput: stringoutput: string
- 分割:
train: 44307017 字节, 13814 样本validation: 2477610 字节, 799 样本test: 3070360 字节, 977 样本
- 下载大小: 4999156 字节
- 数据集大小: 49854987 字节
en_ewt
- 特征:
instruction: stringinput: stringoutput: string
- 分割:
train: 38805886 字节, 12543 样本validation: 6000641 字节, 2002 样本test: 6198885 字节, 2077 样本
- 下载大小: 3810046 字节
- 数据集大小: 51005412 字节
es_ancora
- 特征:
instruction: stringinput: stringoutput: string
- 分割:
train: 49943950 字节, 14305 样本validation: 5798461 字节, 1654 样本test: 5985191 字节, 1721 样本
- 下载大小: 8063762 字节
- 数据集大小: 61727602 字节
fr_gsd
- 特征:
instruction: stringinput: stringoutput: string
- 分割:
train: 48157929 字节, 14449 样本validation: 4906593 字节, 1476 样本test: 1378398 字节, 416 样本
- 下载大小: 6341149 字节
- 数据集大小: 54442920 字节
hsb_ufal
- 特征:
instruction: stringinput: stringoutput: string
- 分割:
train: 74433 字节, 23 样本test: 1963315 字节, 623 样本
- 下载大小: 218777 字节
- 数据集大小: 2037748 字节
kk_ktb
- 特征:
instruction: stringinput: stringoutput: string
- 分割:
train: 102630 字节, 31 样本test: 3176663 字节, 1047 样本
- 下载大小: 257360 字节
- 数据集大小: 3279293 字节
lt_hse
- 特征:
instruction: stringinput: stringoutput: string
- 分割:
train: 501163 字节, 153 样本validation: 501163 字节, 153 样本test: 501163 字节, 153 样本
- 下载大小: 229455 字节
- 数据集大小: 1503489 字节
ru_syntagrus
- 特征:
instruction: stringinput: stringoutput: string
- 分割:
train: 163096580 字节, 48814 样本validation: 21977495 字节, 6584 样本test: 21691135 字节, 6491 样本
- 下载大小: 21623891 字节
- 数据集大小: 206765210 字节



