five

spdenisov/udtrees

收藏
Hugging Face2023-03-29 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/spdenisov/udtrees
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: language dtype: string - name: sentence dtype: string - name: conllu dtype: string splits: - name: ru_syntagrus_ud_train_a num_bytes: 39997721 num_examples: 24516 - name: en_ewt_ud_train num_bytes: 13066595 num_examples: 12544 - name: es_ancora_ud_train num_bytes: 41576563 num_examples: 14287 - name: ga_idt_ud_train num_bytes: 6578580 num_examples: 4005 - name: tr_tourism_ud_train num_bytes: 5072132 num_examples: 15476 - name: ar_nyuad_ud_train num_bytes: 46449076 num_examples: 15789 - name: cop_scriptorium_ud_train num_bytes: 3127527 num_examples: 1379 - name: tr_kenet_ud_train num_bytes: 9965621 num_examples: 15398 - name: ar_padt_ud_train num_bytes: 39971051 num_examples: 6075 - name: tr_penn_ud_train num_bytes: 11428060 num_examples: 14850 - name: es_gsd_ud_train num_bytes: 22823430 num_examples: 14187 - name: fi_tdt_ud_train num_bytes: 13228364 num_examples: 12217 - name: nl_alpino_ud_train num_bytes: 13981525 num_examples: 12289 - name: fi_ftb_ud_train num_bytes: 10264036 num_examples: 14981 - name: ru_syntagrus_ud_train_b num_bytes: 42083027 num_examples: 24298 - name: no_nynorsk_ud_train num_bytes: 14940608 num_examples: 14174 - name: de_hdt_ud_train_a_2 num_bytes: 49150973 num_examples: 37515 - name: hu_szeged_ud_train num_bytes: 1445467 num_examples: 910 - name: cs_pdt_ud_train_l num_bytes: 79765505 num_examples: 41559 - name: de_hdt_ud_train_a_1 num_bytes: 50530678 num_examples: 38102 - name: tr_boun_ud_train num_bytes: 7821321 num_examples: 7803 - name: fr_gsd_ud_train num_bytes: 22444299 num_examples: 14450 - name: no_bokmaal_ud_train num_bytes: 14918030 num_examples: 15696 - name: fr_partut_ud_train num_bytes: 1515774 num_examples: 803 - name: de_gsd_ud_train num_bytes: 19353463 num_examples: 13814 - name: fr_rhapsodie_ud_train num_bytes: 1191845 num_examples: 1288 - name: en_partut_ud_train num_bytes: 2341782 num_examples: 1781 - name: cs_cac_ud_train num_bytes: 52776214 num_examples: 23478 - name: fr_sequoia_ud_train num_bytes: 3107869 num_examples: 2231 - name: cs_pdt_ud_train_c num_bytes: 14988159 num_examples: 8938 - name: en_gum_ud_train num_bytes: 10299158 num_examples: 6911 - name: hy_armtdp_ud_train num_bytes: 5096313 num_examples: 1974 - name: ru_gsd_ud_train num_bytes: 6690467 num_examples: 3850 - name: it_parlamint_ud_train num_bytes: 641089 num_examples: 326 - name: no_nynorsklia_ud_train num_bytes: 1951602 num_examples: 3412 - name: tr_framenet_ud_train num_bytes: 1198915 num_examples: 2288 - name: gd_arcosg_ud_train num_bytes: 4010492 num_examples: 3541 - name: de_hdt_ud_train_b_2 num_bytes: 51033245 num_examples: 39007 - name: it_vit_ud_train num_bytes: 14017218 num_examples: 8277 - name: zh_gsdsimp_ud_train num_bytes: 5375774 num_examples: 3997 - name: fr_ftb_ud_train num_bytes: 24036178 num_examples: 14759 - name: cy_ccg_ud_train num_bytes: 1370915 num_examples: 1111 - name: de_hdt_ud_train_b_1 num_bytes: 53015860 num_examples: 38411 - name: zh_gsd_ud_train num_bytes: 5375739 num_examples: 3997 - name: hy_bsut_ud_train num_bytes: 2570067 num_examples: 1226 - name: fr_parisstories_ud_train num_bytes: 1434200 num_examples: 1390 - name: gv_cadhan_ud_train num_bytes: 547774 num_examples: 1172 - name: ro_rrt_ud_train num_bytes: 14443371 num_examples: 8043 - name: pt_cintil_ud_train num_bytes: 19037477 num_examples: 30720 - name: ru_taiga_ud_train num_bytes: 14956116 num_examples: 16045 - name: cs_pdt_ud_train_m num_bytes: 20158243 num_examples: 11180 - name: tr_atis_ud_train num_bytes: 2633984 num_examples: 4274 - name: cs_pdt_ud_train_v num_bytes: 14454519 num_examples: 6818 - name: it_isdt_ud_train num_bytes: 19225718 num_examples: 13121 - name: ru_syntagrus_ud_train_c num_bytes: 30439785 num_examples: 20816 - name: cs_fictree_ud_train num_bytes: 13380642 num_examples: 10160 - name: en_atis_ud_train num_bytes: 2524032 num_examples: 4274 - name: en_lines_ud_train num_bytes: 3264741 num_examples: 3176 - name: da_ddt_ud_train num_bytes: 5047075 num_examples: 4383 - name: fa_seraji_ud_train num_bytes: 11517586 num_examples: 4798 - name: fa_perdt_ud_train num_bytes: 29881906 num_examples: 26196 download_size: 335579995 dataset_size: 1045535496 --- # Dataset Card for "udtrees" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
spdenisov
原始信息汇总

数据集概述

数据集特征

  • language: 数据类型为字符串
  • sentence: 数据类型为字符串
  • conllu: 数据类型为字符串

数据集分割

数据集包含多个语言和对应的训练集,每个分割包括以下信息:

  • 名称: 语言和数据集类型
  • 字节数: 数据大小
  • 示例数: 数据集中的样本数量

数据集统计

  • 下载大小: 335579995 字节
  • 数据集大小: 1045535496 字节

数据集分割详情

名称 字节数 示例数
ru_syntagrus_ud_train_a 39997721 24516
en_ewt_ud_train 13066595 12544
es_ancora_ud_train 41576563 14287
ga_idt_ud_train 6578580 4005
tr_tourism_ud_train 5072132 15476
ar_nyuad_ud_train 46449076 15789
cop_scriptorium_ud_train 3127527 1379
tr_kenet_ud_train 9965621 15398
ar_padt_ud_train 39971051 6075
tr_penn_ud_train 11428060 14850
es_gsd_ud_train 22823430 14187
fi_tdt_ud_train 13228364 12217
nl_alpino_ud_train 13981525 12289
fi_ftb_ud_train 10264036 14981
ru_syntagrus_ud_train_b 42083027 24298
no_nynorsk_ud_train 14940608 14174
de_hdt_ud_train_a_2 49150973 37515
hu_szeged_ud_train 1445467 910
cs_pdt_ud_train_l 79765505 41559
de_hdt_ud_train_a_1 50530678 38102
tr_boun_ud_train 7821321 7803
fr_gsd_ud_train 22444299 14450
no_bokmaal_ud_train 14918030 15696
fr_partut_ud_train 1515774 803
de_gsd_ud_train 19353463 13814
fr_rhapsodie_ud_train 1191845 1288
en_partut_ud_train 2341782 1781
cs_cac_ud_train 52776214 23478
fr_sequoia_ud_train 3107869 2231
cs_pdt_ud_train_c 14988159 8938
en_gum_ud_train 10299158 6911
hy_armtdp_ud_train 5096313 1974
ru_gsd_ud_train 6690467 3850
it_parlamint_ud_train 641089 326
no_nynorsklia_ud_train 1951602 3412
tr_framenet_ud_train 1198915 2288
gd_arcosg_ud_train 4010492 3541
de_hdt_ud_train_b_2 51033245 39007
it_vit_ud_train 14017218 8277
zh_gsdsimp_ud_train 5375774 3997
fr_ftb_ud_train 24036178 14759
cy_ccg_ud_train 1370915 1111
de_hdt_ud_train_b_1 53015860 38411
zh_gsd_ud_train 5375739 3997
hy_bsut_ud_train 2570067 1226
fr_parisstories_ud_train 1434200 1390
gv_cadhan_ud_train 547774 1172
ro_rrt_ud_train 14443371 8043
pt_cintil_ud_train 19037477 30720
ru_taiga_ud_train 14956116 16045
cs_pdt_ud_train_m 20158243 11180
tr_atis_ud_train 2633984 4274
cs_pdt_ud_train_v 14454519 6818
it_isdt_ud_train 19225718 13121
ru_syntagrus_ud_train_c 30439785 20816
cs_fictree_ud_train 13380642 10160
en_atis_ud_train 2524032 4274
en_lines_ud_train 3264741 3176
da_ddt_ud_train 5047075 4383
fa_seraji_ud_train 11517586 4798
fa_perdt_ud_train 29881906 26196

以上为数据集的详细分割信息,包括各语言对应的训练集大小和样本数量。

搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集是一个多语言依存句法树数据集,包含约724,000行文本数据,覆盖俄语、英语、西班牙语等多种语言的训练子集。数据以parquet格式存储,提供详细的句法标注信息,适用于自然语言处理任务,如依存句法分析和语言建模。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作