five

spdenisov/ud

收藏
Hugging Face2023-03-11 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/spdenisov/ud
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: input_ids sequence: int32 splits: - name: de_gsd_ud_train num_bytes: 113219544 num_examples: 13814 - name: fr_parisstories_ud_train num_bytes: 11392440 num_examples: 1390 - name: ar_nyuad_ud_train num_bytes: 129406644 num_examples: 15789 - name: hu_szeged_ud_train num_bytes: 7458360 num_examples: 910 - name: fi_ftb_ud_train num_bytes: 122784276 num_examples: 14981 - name: it_vit_ud_train num_bytes: 67838292 num_examples: 8277 - name: zh_gsdsimp_ud_train num_bytes: 32759412 num_examples: 3997 - name: hy_bsut_ud_train num_bytes: 10048296 num_examples: 1226 - name: cop_scriptorium_ud_train num_bytes: 11302284 num_examples: 1379 - name: no_bokmaal_ud_train num_bytes: 128644416 num_examples: 15696 - name: gv_cadhan_ud_train num_bytes: 9605712 num_examples: 1172 - name: fr_partut_ud_train num_bytes: 6581388 num_examples: 803 - name: fi_tdt_ud_train num_bytes: 100130532 num_examples: 12217 - name: tr_kenet_ud_train num_bytes: 126202008 num_examples: 15398 - name: fr_sequoia_ud_train num_bytes: 18285276 num_examples: 2231 - name: hy_armtdp_ud_train num_bytes: 16178904 num_examples: 1974 - name: ga_idt_ud_train num_bytes: 32824980 num_examples: 4005 - name: ru_syntagrus_ud_train_b num_bytes: 199146408 num_examples: 24298 - name: nl_alpino_ud_train num_bytes: 100720644 num_examples: 12289 - name: en_partut_ud_train num_bytes: 14597076 num_examples: 1781 - name: en_gum_ud_train num_bytes: 56642556 num_examples: 6911 - name: ru_taiga_ud_train num_bytes: 131504820 num_examples: 16045 - name: da_ddt_ud_train num_bytes: 35923068 num_examples: 4383 - name: zh_gsd_ud_train num_bytes: 32759412 num_examples: 3997 - name: de_hdt_ud_train_a_2 num_bytes: 307472940 num_examples: 37515 - name: it_isdt_ud_train num_bytes: 107539716 num_examples: 13121 - name: es_gsd_ud_train num_bytes: 116276652 num_examples: 14187 - name: gd_arcosg_ud_train num_bytes: 29022036 num_examples: 3541 - name: ru_syntagrus_ud_train_a num_bytes: 200933136 num_examples: 24516 - name: de_hdt_ud_train_a_1 num_bytes: 312283992 num_examples: 38102 - name: tr_penn_ud_train num_bytes: 121710600 num_examples: 14850 - name: tr_atis_ud_train num_bytes: 35029704 num_examples: 4274 - name: en_lines_ud_train num_bytes: 26030496 num_examples: 3176 - name: cs_pdt_ud_train_c num_bytes: 73255848 num_examples: 8938 - name: de_hdt_ud_train_b_2 num_bytes: 319701372 num_examples: 39007 - name: cs_pdt_ud_train_l num_bytes: 340617564 num_examples: 41559 - name: cs_pdt_ud_train_v num_bytes: 55880328 num_examples: 6818 - name: pt_cintil_ud_train num_bytes: 251781120 num_examples: 30720 - name: no_nynorsklia_ud_train num_bytes: 27964752 num_examples: 3412 - name: cs_cac_ud_train num_bytes: 192425688 num_examples: 23478 - name: tr_framenet_ud_train num_bytes: 18752448 num_examples: 2288 - name: it_parlamint_ud_train num_bytes: 2671896 num_examples: 326 - name: es_ancora_ud_train num_bytes: 117096252 num_examples: 14287 - name: en_atis_ud_train num_bytes: 35029704 num_examples: 4274 - name: cy_ccg_ud_train num_bytes: 9105756 num_examples: 1111 - name: fr_gsd_ud_train num_bytes: 118432200 num_examples: 14450 - name: tr_tourism_ud_train num_bytes: 126841296 num_examples: 15476 - name: en_ewt_ud_train num_bytes: 102810624 num_examples: 12544 - name: ru_syntagrus_ud_train_c num_bytes: 170607936 num_examples: 20816 - name: fr_ftb_ud_train num_bytes: 120964764 num_examples: 14759 - name: cs_fictree_ud_train num_bytes: 83271360 num_examples: 10160 - name: ar_padt_ud_train num_bytes: 49790700 num_examples: 6075 - name: ro_rrt_ud_train num_bytes: 65920428 num_examples: 8043 - name: tr_boun_ud_train num_bytes: 63953388 num_examples: 7803 - name: de_hdt_ud_train_b_1 num_bytes: 314816556 num_examples: 38411 - name: cs_pdt_ud_train_m num_bytes: 91631280 num_examples: 11180 - name: no_nynorsk_ud_train num_bytes: 116170104 num_examples: 14174 - name: ru_gsd_ud_train num_bytes: 31554600 num_examples: 3850 - name: fr_rhapsodie_ud_train num_bytes: 10556448 num_examples: 1288 download_size: 357289141 dataset_size: 5683860432 --- # Dataset Card for "ud" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
spdenisov
原始信息汇总

数据集概述

数据集特征

  • 名称: input_ids
  • 序列类型: int32

数据集拆分

名称 字节数 示例数
de_gsd_ud_train 113219544 13814
fr_parisstories_ud_train 11392440 1390
ar_nyuad_ud_train 129406644 15789
hu_szeged_ud_train 7458360 910
fi_ftb_ud_train 122784276 14981
it_vit_ud_train 67838292 8277
zh_gsdsimp_ud_train 32759412 3997
hy_bsut_ud_train 10048296 1226
cop_scriptorium_ud_train 11302284 1379
no_bokmaal_ud_train 128644416 15696
gv_cadhan_ud_train 9605712 1172
fr_partut_ud_train 6581388 803
fi_tdt_ud_train 100130532 12217
tr_kenet_ud_train 126202008 15398
fr_sequoia_ud_train 18285276 2231
hy_armtdp_ud_train 16178904 1974
ga_idt_ud_train 32824980 4005
ru_syntagrus_ud_train_b 199146408 24298
nl_alpino_ud_train 100720644 12289
en_partut_ud_train 14597076 1781
en_gum_ud_train 56642556 6911
ru_taiga_ud_train 131504820 16045
da_ddt_ud_train 35923068 4383
zh_gsd_ud_train 32759412 3997
de_hdt_ud_train_a_2 307472940 37515
it_isdt_ud_train 107539716 13121
es_gsd_ud_train 116276652 14187
gd_arcosg_ud_train 29022036 3541
ru_syntagrus_ud_train_a 200933136 24516
de_hdt_ud_train_a_1 312283992 38102
tr_penn_ud_train 121710600 14850
tr_atis_ud_train 35029704 4274
en_lines_ud_train 26030496 3176
cs_pdt_ud_train_c 73255848 8938
de_hdt_ud_train_b_2 319701372 39007
cs_pdt_ud_train_l 340617564 41559
cs_pdt_ud_train_v 55880328 6818
pt_cintil_ud_train 251781120 30720
no_nynorsklia_ud_train 27964752 3412
cs_cac_ud_train 192425688 23478
tr_framenet_ud_train 18752448 2288
it_parlamint_ud_train 2671896 326
es_ancora_ud_train 117096252 14287
en_atis_ud_train 35029704 4274
cy_ccg_ud_train 9105756 1111
fr_gsd_ud_train 118432200 14450
tr_tourism_ud_train 126841296 15476
en_ewt_ud_train 102810624 12544
ru_syntagrus_ud_train_c 170607936 20816
fr_ftb_ud_train 120964764 14759
cs_fictree_ud_train 83271360 10160
ar_padt_ud_train 49790700 6075
ro_rrt_ud_train 65920428 8043
tr_boun_ud_train 63953388 7803
de_hdt_ud_train_b_1 314816556 38411
cs_pdt_ud_train_m 91631280 11180
no_nynorsk_ud_train 116170104 14174
ru_gsd_ud_train 31554600 3850
fr_rhapsodie_ud_train 10556448 1288

数据集大小

  • 下载大小: 357289141
  • 数据集大小: 5683860432
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作