spdenisov/ud
收藏Hugging Face2023-03-11 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/spdenisov/ud
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: input_ids
sequence: int32
splits:
- name: de_gsd_ud_train
num_bytes: 113219544
num_examples: 13814
- name: fr_parisstories_ud_train
num_bytes: 11392440
num_examples: 1390
- name: ar_nyuad_ud_train
num_bytes: 129406644
num_examples: 15789
- name: hu_szeged_ud_train
num_bytes: 7458360
num_examples: 910
- name: fi_ftb_ud_train
num_bytes: 122784276
num_examples: 14981
- name: it_vit_ud_train
num_bytes: 67838292
num_examples: 8277
- name: zh_gsdsimp_ud_train
num_bytes: 32759412
num_examples: 3997
- name: hy_bsut_ud_train
num_bytes: 10048296
num_examples: 1226
- name: cop_scriptorium_ud_train
num_bytes: 11302284
num_examples: 1379
- name: no_bokmaal_ud_train
num_bytes: 128644416
num_examples: 15696
- name: gv_cadhan_ud_train
num_bytes: 9605712
num_examples: 1172
- name: fr_partut_ud_train
num_bytes: 6581388
num_examples: 803
- name: fi_tdt_ud_train
num_bytes: 100130532
num_examples: 12217
- name: tr_kenet_ud_train
num_bytes: 126202008
num_examples: 15398
- name: fr_sequoia_ud_train
num_bytes: 18285276
num_examples: 2231
- name: hy_armtdp_ud_train
num_bytes: 16178904
num_examples: 1974
- name: ga_idt_ud_train
num_bytes: 32824980
num_examples: 4005
- name: ru_syntagrus_ud_train_b
num_bytes: 199146408
num_examples: 24298
- name: nl_alpino_ud_train
num_bytes: 100720644
num_examples: 12289
- name: en_partut_ud_train
num_bytes: 14597076
num_examples: 1781
- name: en_gum_ud_train
num_bytes: 56642556
num_examples: 6911
- name: ru_taiga_ud_train
num_bytes: 131504820
num_examples: 16045
- name: da_ddt_ud_train
num_bytes: 35923068
num_examples: 4383
- name: zh_gsd_ud_train
num_bytes: 32759412
num_examples: 3997
- name: de_hdt_ud_train_a_2
num_bytes: 307472940
num_examples: 37515
- name: it_isdt_ud_train
num_bytes: 107539716
num_examples: 13121
- name: es_gsd_ud_train
num_bytes: 116276652
num_examples: 14187
- name: gd_arcosg_ud_train
num_bytes: 29022036
num_examples: 3541
- name: ru_syntagrus_ud_train_a
num_bytes: 200933136
num_examples: 24516
- name: de_hdt_ud_train_a_1
num_bytes: 312283992
num_examples: 38102
- name: tr_penn_ud_train
num_bytes: 121710600
num_examples: 14850
- name: tr_atis_ud_train
num_bytes: 35029704
num_examples: 4274
- name: en_lines_ud_train
num_bytes: 26030496
num_examples: 3176
- name: cs_pdt_ud_train_c
num_bytes: 73255848
num_examples: 8938
- name: de_hdt_ud_train_b_2
num_bytes: 319701372
num_examples: 39007
- name: cs_pdt_ud_train_l
num_bytes: 340617564
num_examples: 41559
- name: cs_pdt_ud_train_v
num_bytes: 55880328
num_examples: 6818
- name: pt_cintil_ud_train
num_bytes: 251781120
num_examples: 30720
- name: no_nynorsklia_ud_train
num_bytes: 27964752
num_examples: 3412
- name: cs_cac_ud_train
num_bytes: 192425688
num_examples: 23478
- name: tr_framenet_ud_train
num_bytes: 18752448
num_examples: 2288
- name: it_parlamint_ud_train
num_bytes: 2671896
num_examples: 326
- name: es_ancora_ud_train
num_bytes: 117096252
num_examples: 14287
- name: en_atis_ud_train
num_bytes: 35029704
num_examples: 4274
- name: cy_ccg_ud_train
num_bytes: 9105756
num_examples: 1111
- name: fr_gsd_ud_train
num_bytes: 118432200
num_examples: 14450
- name: tr_tourism_ud_train
num_bytes: 126841296
num_examples: 15476
- name: en_ewt_ud_train
num_bytes: 102810624
num_examples: 12544
- name: ru_syntagrus_ud_train_c
num_bytes: 170607936
num_examples: 20816
- name: fr_ftb_ud_train
num_bytes: 120964764
num_examples: 14759
- name: cs_fictree_ud_train
num_bytes: 83271360
num_examples: 10160
- name: ar_padt_ud_train
num_bytes: 49790700
num_examples: 6075
- name: ro_rrt_ud_train
num_bytes: 65920428
num_examples: 8043
- name: tr_boun_ud_train
num_bytes: 63953388
num_examples: 7803
- name: de_hdt_ud_train_b_1
num_bytes: 314816556
num_examples: 38411
- name: cs_pdt_ud_train_m
num_bytes: 91631280
num_examples: 11180
- name: no_nynorsk_ud_train
num_bytes: 116170104
num_examples: 14174
- name: ru_gsd_ud_train
num_bytes: 31554600
num_examples: 3850
- name: fr_rhapsodie_ud_train
num_bytes: 10556448
num_examples: 1288
download_size: 357289141
dataset_size: 5683860432
---
# Dataset Card for "ud"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
spdenisov
原始信息汇总
数据集概述
数据集特征
- 名称: input_ids
- 序列类型: int32
数据集拆分
| 名称 | 字节数 | 示例数 |
|---|---|---|
| de_gsd_ud_train | 113219544 | 13814 |
| fr_parisstories_ud_train | 11392440 | 1390 |
| ar_nyuad_ud_train | 129406644 | 15789 |
| hu_szeged_ud_train | 7458360 | 910 |
| fi_ftb_ud_train | 122784276 | 14981 |
| it_vit_ud_train | 67838292 | 8277 |
| zh_gsdsimp_ud_train | 32759412 | 3997 |
| hy_bsut_ud_train | 10048296 | 1226 |
| cop_scriptorium_ud_train | 11302284 | 1379 |
| no_bokmaal_ud_train | 128644416 | 15696 |
| gv_cadhan_ud_train | 9605712 | 1172 |
| fr_partut_ud_train | 6581388 | 803 |
| fi_tdt_ud_train | 100130532 | 12217 |
| tr_kenet_ud_train | 126202008 | 15398 |
| fr_sequoia_ud_train | 18285276 | 2231 |
| hy_armtdp_ud_train | 16178904 | 1974 |
| ga_idt_ud_train | 32824980 | 4005 |
| ru_syntagrus_ud_train_b | 199146408 | 24298 |
| nl_alpino_ud_train | 100720644 | 12289 |
| en_partut_ud_train | 14597076 | 1781 |
| en_gum_ud_train | 56642556 | 6911 |
| ru_taiga_ud_train | 131504820 | 16045 |
| da_ddt_ud_train | 35923068 | 4383 |
| zh_gsd_ud_train | 32759412 | 3997 |
| de_hdt_ud_train_a_2 | 307472940 | 37515 |
| it_isdt_ud_train | 107539716 | 13121 |
| es_gsd_ud_train | 116276652 | 14187 |
| gd_arcosg_ud_train | 29022036 | 3541 |
| ru_syntagrus_ud_train_a | 200933136 | 24516 |
| de_hdt_ud_train_a_1 | 312283992 | 38102 |
| tr_penn_ud_train | 121710600 | 14850 |
| tr_atis_ud_train | 35029704 | 4274 |
| en_lines_ud_train | 26030496 | 3176 |
| cs_pdt_ud_train_c | 73255848 | 8938 |
| de_hdt_ud_train_b_2 | 319701372 | 39007 |
| cs_pdt_ud_train_l | 340617564 | 41559 |
| cs_pdt_ud_train_v | 55880328 | 6818 |
| pt_cintil_ud_train | 251781120 | 30720 |
| no_nynorsklia_ud_train | 27964752 | 3412 |
| cs_cac_ud_train | 192425688 | 23478 |
| tr_framenet_ud_train | 18752448 | 2288 |
| it_parlamint_ud_train | 2671896 | 326 |
| es_ancora_ud_train | 117096252 | 14287 |
| en_atis_ud_train | 35029704 | 4274 |
| cy_ccg_ud_train | 9105756 | 1111 |
| fr_gsd_ud_train | 118432200 | 14450 |
| tr_tourism_ud_train | 126841296 | 15476 |
| en_ewt_ud_train | 102810624 | 12544 |
| ru_syntagrus_ud_train_c | 170607936 | 20816 |
| fr_ftb_ud_train | 120964764 | 14759 |
| cs_fictree_ud_train | 83271360 | 10160 |
| ar_padt_ud_train | 49790700 | 6075 |
| ro_rrt_ud_train | 65920428 | 8043 |
| tr_boun_ud_train | 63953388 | 7803 |
| de_hdt_ud_train_b_1 | 314816556 | 38411 |
| cs_pdt_ud_train_m | 91631280 | 11180 |
| no_nynorsk_ud_train | 116170104 | 14174 |
| ru_gsd_ud_train | 31554600 | 3850 |
| fr_rhapsodie_ud_train | 10556448 | 1288 |
数据集大小
- 下载大小: 357289141
- 数据集大小: 5683860432



