five

SEACrowd/ud

收藏
Hugging Face2024-06-24 更新2024-06-29 收录
下载链接:
https://hf-mirror.com/datasets/SEACrowd/ud
下载链接
链接失效反馈
官方服务:
资源简介:
Universal Dependencies (UD) 是一个为多种语言开发跨语言一致的树库注释的项目,旨在促进多语言解析器开发、跨语言学习和从语言类型学角度进行解析研究。注释方案基于Stanford依赖关系、Google通用词性标签和Interset语际词法标签集的演变。数据集支持的任务包括词性标注、依存句法分析和机器翻译。数据集包含的语言有印尼语(ind)、越南语(vie)和他加禄语(tgl)。

Universal Dependencies (UD) is a project that is developing cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on an evolution of Stanford dependencies, Google universal part-of-speech tags, and the Interset interlingua for morphosyntactic tagsets. The dataset supports tasks such as POS tagging, dependency parsing, and machine translation. The languages included in the dataset are Indonesian (ind), Vietnamese (vie), and Tagalog (tgl).
提供机构:
SEACrowd
原始信息汇总

数据集概述

基本信息

  • 许可证: Apache 2.0 (apache-2.0)
  • 语言:
    • 印尼语 (ind)
    • 越南语 (vie)
    • 他加禄语 (tgl)
  • 任务类别:
    • 词性标注 (pos-tagging)
    • 依存句法分析 (dependency-parsing)
    • 机器翻译 (machine-translation)

数据集描述

Universal Dependencies (UD) 是一个跨语言一致的树库注释项目,旨在促进多语言解析器开发、跨语言学习和语言类型学视角的解析研究。注释方案基于斯坦福通用依存关系 (de Marneffe et al., 2006, 2008, 2014)、Google 通用词性标签 (Petrov et al., 2012) 和 Interset 形态句法标签集 (Zeman, 2008)。

支持任务

  • 词性标注
  • 依存句法分析
  • 机器翻译

数据集版本

  • 源版本: 2.13.0
  • SEACrowd 版本: 2024.06.20

数据集加载

使用 datasets

python from datasets import load_dataset dset = datasets.load_dataset("SEACrowd/ud", trust_remote_code=True)

使用 seacrowd

python import seacrowd as sc

使用默认配置加载数据集

dset = sc.load_dataset("ud", schema="seacrowd")

检查所有可用子集(配置名称)

print(sc.available_config_names("ud"))

使用特定配置加载数据集

dset = sc.load_dataset_by_config_name(config_name="<config_name>")

数据集主页

https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-5287

引用

如果使用 Ud 数据加载器,请引用以下内容:

@misc{11234/1-5287, title = {Universal Dependencies 2.13}, author = {Zeman, Daniel and Nivre, Joakim and Abrams, Mitchell and Ackermann, Elia and Aepli, No{"e}mi and Aghaei, Hamid and Agi{c}, {v Z}eljko and Ahmadi, Amir and Ahrenberg, Lars and Ajede, Chika Kennedy and Akkurt, Salih Furkan and Aleksandravi{v c}i{=u}t{.e}, Gabriel{.e} and Alfina, Ika and Algom, Avner and Alnajjar, Khalid and Alzetta, Chiara and Andersen, Erik and Antonsen, Lene and Aoyama, Tatsuya and Aplonova, Katya and Aquino, Angelina and Aragon, Carolina and Aranes, Glyd and Aranzabe, Maria Jesus and Ar{i}can, Bilge Nas and Arnard{o}ttir, { H}{o}runn and Arutie, Gashaw and Arwidarasti, Jessica Naraiswari and Asahara, Masayuki and {A}sgeirsd{o}ttir, Katla and Aslan, Deniz Baran and Asmazo{u g}lu, Cengiz and Ateyah, Luma and Atmaca, Furkan and Attia, Mohammed and Atutxa, Aitziber and Augustinus, Liesbeth and Avel{~a}s, Mariana and Badmaeva, Elena and Balasubramani, Keerthana and Ballesteros, Miguel and Banerjee, Esha and Bank, Sebastian and Barbu Mititelu, Verginica and Barkarson, Starkaður and Basile, Rodolfo and Basmov, Victoria and Batchelor, Colin and Bauer, John and Bedir, Seyyit Talha and Behzad, Shabnam and Belieni, Juan and Bengoetxea, Kepa and Benli, İbrahim and Ben Moshe, Yifat and Berk, G{"o}zde and Bhat, Riyaz Ahmad and Biagetti, Erica and Bick, Eckhard and Bielinskien{.e}, Agn{.e} and Bjarnad{o}ttir, Krist{{i}}n and Blokland, Rogier and Bobicev, Victoria and Boizou, Lo{"{i}}c and Borges V{"o}lker, Emanuel and B{"o}rstell, Carl and Bosco, Cristina and Bouma, Gosse and Bowman, Sam and Boyd, Adriane and Braggaar, Anouck and Branco, Ant{o}nio and Brokait{.e}, Kristina and Burchardt, Aljoscha and Campos, Marisa and Candito, Marie and Caron, Bernard and Caron, Gauthier and Carvalheiro, Catarina and Carvalho, Rita and Cassidy, Lauren and Castro, Maria Clara and Castro, S{e}rgio and Cavalcanti, Tatiana and Cebiro{u g}lu Eryi{u g}it, G{"u}l{c s}en and Cecchini, Flavio Massimiliano and Celano, Giuseppe G. A. and {v C}{e}pl{"o}, Slavom{{i}}r and Cesur, Neslihan and Cetin, Savas and {c C}etino{u g}lu, {"O}zlem and Chalub, Fabricio and Chamila, Liyanage and Chauhan, Shweta and Chi, Ethan and Chika, Taishi and Cho, Yongseok and Choi, Jinho and Chun, Jayeol and Chung, Juyeon and Cignarella, Alessandra T. and Cinkov{a}, Silvie and Collomb, Aur{e}lie and {c C}{"o}ltekin, {c C}a{u g}r{i} and Connor, Miriam and Corbetta, Claudia and Corbetta, Daniela and Costa, Francisco and Courtin, Marine and Crabb{e}, Beno{^{i}}t and Cristescu, Mihaela and Cvetkoski, Vladimir and Dale, Ingerid L{o}yning and Daniel, Philemon and Davidson, Elizabeth and de Alencar, Leonel Figueiredo and Dehouck, Mathieu and de Laurentiis, Martina and de Marneffe, Marie-Catherine and de Paiva, Valeria and Derin, Mehmet Oguz and de Souza, Elvis and Diaz de Ilarraza, Arantza and Dickerson, Carly and Dinakaramani, Arawinda and Di Nuovo, Elisa and Dione, Bamba and Dirix, Peter and Dobrovoljc, Kaja and Doyle, Adrian and Dozat, Timothy and Droganova, Kira and Duran, Magali Sanches and Dwivedi, Puneet and Ebert, Christian and Eckhoff, Hanne and Eguchi, Masaki and Eiche, Sandra and Eli, Marhaba and Elkahky, Ali and Ephrem, Binyam and Erina, Olga and Erjavec, Toma{v z} and Essaidi, Farah and Etienne, Aline and Evelyn, Wograine and Facundes, Sidney and Farkas, Rich{a}rd and Favero, Federica and Ferdaousi, Jannatul and Fernanda, Mar{{i}}lia and Fernandez Alcalde, Hector and Fethi, Amal and Foster, Jennifer and Fransen, Theodorus and Freitas, Cl{a}udia and Fujita, Kazunori and Gajdo{v s}ov{a}, Katar{{i}}na and Galbraith, Daniel and Gamba, Federica and Garcia, Marcos and G{"a}rdenfors, Moa and Gerardi, Fabr{{i}}cio Ferraz and Gerdes, Kim and Gessler, Luke and Ginter, Filip and Godoy, Gustavo and Goenaga, Iakes and Gojenola, Koldo and G{"o}k{i}rmak, Memduh and Goldberg, Yoav and G{o}mez Guinovart, Xavier and Gonz{a}lez Saavedra, Berta and Grici{=u}t{.e}, Bernadeta and Grioni, Matias and Grobol, Lo{"{i}}c and Gr{= u}z{={i}}tis, Normunds and Guillaume, Bruno and Guiller, Kirian and Guillot-Barbance, C{e}line and G{"u}ng{"o}r, Tunga and Habash, Nizar and Hafsteinsson, Hinrik and Haji{v c}, Jan and Haji{v c} jr., Jan and H{"a}m{"a}l{"a}inen, Mika and H{a} M{~y}, Linh and Han, Na-Rae and Hanifmuti, Muhammad Yudistira and Harada, Takahiro and Hardwick, Sam and Harris, Kim and Haug, Dag and Heinecke, Johannes and Hellwig, Oliver and Hennig, Felix and Hladk{a}, Barbora and Hlav{a}{v c}ov{a}, Jaroslava and Hociung, Florinel and Hohle, Petter and Huang, Yidi and Huerta Mendez, Marivel and Hwang, Jena and Ikeda, Takumi and Ingason, Anton Karl and Ion, Radu and Irimia, Elena and Ishola, {d O}l{a}j{{i}}d{e} and Islamaj, Artan and Ito, Kaoru and Jagodzi{ }ska, Sandra and Jannat, Siratun and Jel{{i}}nek, Tom{a}{v s} and Jha, Apoorva and Jiang, Katharine and Johannsen, Anders and J{o}nsd{o}ttir, Hildur and J{o}rgensen, Fredrik and Juutinen, Markus and Ka{c s}{i}kara, H{"u}ner and Kabaeva, Nadezhda and Kahane, Sylvain and Kanayama, Hiroshi and Kanerva, Jenna and Kara, Neslihan and Karah{o}ǧa, Ritv{a}n and K{aa}sen, Andre and Kayadelen, Tolga and Kengatharaiyer, Sarveswaran and Kettnerov{a}, V{a}clava and Kharatyan, Lilit and Kirchner, Jesse and Klementieva, Elena and Klyachko, Elena and Kocharov, Petr and K{"o}hn, Arne and K{"o}ksal, Abdullatif and Kopacewicz, Kamil and Korkiakangas, Timo and K{"o}se, Mehmet and Koshevoy, Alexey and Kotsyba, Natalia and Kovalevskait{.e}, Jolanta and Krek, Simon and Krishnamurthy, Parameswari and K{"u}bler, Sandra and Kuqi, Adrian and Kuyruk{c c}u, O{u g}uzhan and Kuzgun, Asl{i} and Kwak, Sookyoung and Kyle, Kris and Laan, K{"a}bi and Laippala, Veronika and Lambertino, Lorenzo and Lando, Tatiana and Larasati, Septina Dian and Lavrentiev, Alexei and Lee, John and L{^e} H{{^o}}ng, Phương and Lenci, Alessandro and Lertpradit, Saran and Leung, Herman and Levina, Maria and Levine, Lauren and Li, Cheuk Ying and Li, Josie and Li, Keying and Li, Yixuan and Li, Yuan and Lim, {KyungTae} and Lima Padovani, Bruna and Lin, Yi-Ju Jessica and Lind{e}n, Krister and Liu, Yang Janet and Ljube{v s}i{c}, Nikola and Lobzhanidze, Irina and Loginova, Olga and Lopes, Lucelene and Lusito, Stefano and Luthfi, Andry and Luukko, Mikko and Lyashevskaya, Olga and Lynn, Teresa and Macketanz, Vivien and Mahamdi, Menel and Maillard, Jean and Makarchuk, Ilya and Makazhanov, Aibek and Mandl, Michael and Manning, Christopher and Manurung, Ruli and Mar{c s}an, B{"u}{c s}ra and M{u a}r{u a}nduc, C{u a}t{u a}lina and Mare{v c}ek, David and Marheinecke, Katrin and Markantonatou, Stella and Mart{{i}}nez Alonso, H{e}ctor and Mart{{i}}n Rodr{{i}}guez, Lorena and Martins, Andr{e} and Martins, Cl{a}udia and Ma{v s}ek, Jan and Matsuda, Hiroshi and Matsumoto, Yuji and Mazzei, Alessandro and {McDonald}, Ryan and {McGuinness}, Sarah and Mendon{c c}a, Gustavo and Merzhevich, Tatiana and Miekka, Niko and Miller, Aaron and Mischenkova, Karina and Missil{"a}, Anna and Mititelu, C{u a}t{u a}lin and Mitrofan, Maria and Miyao, Yusuke and Mojiri Foroushani, {AmirHossein} and Moln{a}r, Judit and Moloodi, Amirsaeid and Montemagni, Simonetta and More, Amir and Moreno Romero, Laura and Moretti, Giovanni and Mori, Shinsuke and Morioka, Tomohiko and Moro, Shigeki and Mortensen, Bjartur and Moskalevskyi, Bohdan and Muischnek, Kadri and Munro, Robert and Murawaki, Yugo and M{"u}{"u}risep, Kaili and Nainwani, Pinkey and Nakhl{e}, Mariam and Navarro Hor{~n}iacek, Juan Ignacio and Nedoluzhko, Anna and Ne{v s}pore-B{=e}rzkalne, Gunta and Nevaci, Manuela and Nguy{~{^e}}n Th{d i}, Lương and Nguy{~{^e}}n Th{d i} Minh, Huy{{^e}}n and Nikaido, Yoshihiro and Nikolaev, Vitaly and Nitisaroj, Rattima and Nourian, Alireza and Nunes, Maria das Gra{c c}as Volpe and Nurmi, Hanna and Ojala, Stina and Ojha, Atul Kr. and {O}lad{o}ttir, Hulda and Ol{u}{o}kun, Ad{e}day{d o}̀ and Omura, Mai and Onwuegbuzia, Emeka and Ordan, Noam and Osenova, Petya and {"O}stling, Robert and {O}vrelid, Lilja and {"O}zate{c s}, {c S}aziye Bet{"u}l and {"O}z{c c}elik, Merve and {"O}zg{"u}r, Arzucan and {"O}zt{"u}rk Ba{c s}aran, Balk{i}z and Paccosi, Teresa and Palmero Aprosio, Alessio and Panova, Anastasia and Pardo, Thiago Alexandre Salgueiro and Park, Hyunji Hayley and Partanen, Niko and Pascual, Elena and Passarotti, Marco and Patejuk, Agnieszka and Paulino-Passos, Guilherme and Pedonese, Giulia and Peljak-{L}api{ }ska, Angelika and Peng, Siyao and Peng, Siyao Logan and Pereira, Rita and Pereira, S{{i}}lvia and Perez, Cenel-Augusto and Perkova, Natalia and Perrier, Guy and Petrov, Slav and Petrova, Daria and Peverelli, Andrea and Phelan, Jason and Pierre-Louis, Claudel and Piitulainen, Jussi and Pinter, Yuval and Pinto, Clara and Pintucci, Rodrigo and Pirinen, Tommi A and Pitler, Emily and Plamada, Magdalena and Plank, Barbara and Poibeau, Thierry and Ponomareva, Larisa and Popel, Martin and Pretkalni{c n}a, Lauma and Pr{e}vost, Sophie and Prokopidis, Prokopis and Przepi{o}rkowski, Adam and Pugh, Robert and Puolakainen, Tiina and Pyysalo, Sampo and Qi, Peng and Querido, Andreia and R{"a}{"a}bis, Andriela and Rademaker, Alexandre and Rahoman, Mizanur and Rama, Taraka and Ramasamy, Loganathan and Ramisch, Carlos and Ramos, Joana and Rashel, Fam and Rasooli, Mohammad Sadegh and Ravishankar, Vinit and Real, Livy and Rebeja, Petru and Reddy, Siva and Regnault, Mathilde and Rehm, Georg and Riabi, Arij and Riabov, Ivan and Rie{ss}ler, Michael and Rimkut{.e}, Erika and Rinaldi, Larissa and Rituma, Laura and Riz

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作