five

SBU-WSDCorpus

收藏
arXiv2021-07-04 更新2024-06-21 收录
下载链接:
https://github.com/hrouhizadeh/SBU-WSDCorpus
下载链接
链接失效反馈
官方服务:
资源简介:
SBU-WSDCorpus是首个针对波斯语全词词义消歧的标准测试集,由沙希德贝赫什提大学创建。该数据集包含19篇来自不同领域的波斯语文档,共5892个内容词,其中3371个词被手动标注了词义。数据集的创建过程涉及数据收集、词义库存选择、标注过程和数据集格式化等步骤。SBU-WSDCorpus主要用于评估和改进波斯语全词词义消歧系统,解决波斯语在自然语言处理中的词义消歧问题。

SBU-WSDCorpus is the first standard test set for Persian all-word word sense disambiguation, developed by Shahid Beheshti University. This corpus contains 19 Persian documents from diverse domains, totaling 5,892 content words, among which 3,371 words have been manually annotated with word senses. The construction of this dataset involves several steps including data collection, word sense inventory selection, annotation process, and dataset formatting. SBU-WSDCorpus is primarily used to evaluate and improve Persian all-word word sense disambiguation systems, addressing the word sense disambiguation issues faced by Persian in natural language processing (NLP).
提供机构:
沙希德贝赫什提大学
创建时间:
2021-07-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作