Neural Morphology Dataset
收藏arXiv2021-05-26 更新2024-06-21 收录
下载链接:
http://doi.org/10.5281/zenodo.3928628
下载链接
链接失效反馈官方服务:
资源简介:
Neural Morphology Dataset是由赫尔辛基大学文学院创建的多语言数据集,专注于形态丰富的语言,包括形态分析、生成和词形化任务。该数据集涵盖22种语言,其中17种为濒危语言,通过自动从有限状态转换器(FST)中提取大量训练数据来构建。数据集遵循与FST相同的标记集,确保神经模型可以作为FST系统的备用模型使用,增强系统的整体覆盖率。数据集的创建旨在为形态学研究提供资源,特别是在濒危语言的保护和研究中发挥作用。
The Neural Morphology Dataset is a multilingual resource developed by the Faculty of Arts, University of Helsinki, targeting morphologically rich languages and covering three core tasks: morphological analysis, morphological generation, and inflection. The dataset includes 22 languages, 17 of which are classified as endangered languages, and is built by automatically extracting large-scale training data from finite-state transducers (FSTs). It adopts the identical tag set as FST systems, enabling neural models to serve as backup alternatives to FST-based systems and thus improving the overall coverage of the morphological processing pipeline. The dataset was developed to offer dedicated resources for morphological research, with a particular focus on facilitating the conservation and scholarly study of endangered languages.
提供机构:
赫尔辛基大学文学院
创建时间:
2021-05-26



