Gujarati Morphological Analyzer Dataset
收藏arXiv2021-12-18 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2112.09860v1
下载链接
链接失效反馈官方服务:
资源简介:
Gujarati Morphological Analyzer Dataset是由达摩辛赫德赛大学创建的,用于训练和评估Gujarati语言的形态分析器。该数据集包含16527个独特的Gujarati单词,每个单词都标注了词根和语法特征。数据集的创建过程涉及对Gujarati语言的语法特征进行识别,并从Gujarati Monolingual Text Corpus ILCI-II中提取数据。该数据集主要用于解决Gujarati语言的形态分析问题,支持自然语言处理任务,如词性标注和形态边界检测。
Gujarati Morphological Analyzer Dataset was created by Dharmsinh Desai University for training and evaluating morphological analyzers for the Gujarati language. This dataset contains 16,527 unique Gujarati words, each annotated with its root form and grammatical features. The dataset creation process involves identifying grammatical features of the Gujarati language and extracting data from the Gujarati Monolingual Text Corpus ILCI-II. It is primarily used to address morphological analysis tasks for the Gujarati language, supporting natural language processing tasks such as part-of-speech tagging and morphological boundary detection.
提供机构:
达摩辛赫德赛大学
创建时间:
2021-12-18



