3K候选同义词数据集
收藏arXiv2023-02-05 更新2024-06-21 收录
下载链接:
https://portal.sina.birzeit.edu/synonyms
下载链接
链接失效反馈官方服务:
资源简介:
本数据集由巴勒斯坦比尔宰特大学计算机科学系创建,包含3000个针对500个同义词集的候选同义词。每个候选同义词均由四位语言学家根据同义词强度进行模糊值标注。数据集旨在理解语言学家在同义词判断上的共识程度,并作为评估同义词提取算法的基准。此外,数据集还用于训练算法,以从现有词汇中提取同义词并计算每个候选同义词的模糊值。该数据集对于自然语言处理任务和知识组织系统中的同义词关系研究具有重要意义,特别是在处理资源较少和高度模糊的语言如阿拉伯语时。
This dataset was created by the Department of Computer Science at Birzeit University, Palestine, and includes 3,000 candidate synonyms for 500 synonym sets. Each candidate synonym was annotated with a fuzzy value by four linguists based on the strength of the synonymy relationship. The dataset is designed to investigate the consensus level of linguists in synonym judgment, and acts as a benchmark for evaluating synonym extraction algorithms. Furthermore, it is utilized to train algorithms for extracting synonyms from existing vocabularies and calculating the fuzzy values of each candidate synonym. This dataset holds significant importance for research on synonym relationships in natural language processing tasks and knowledge organization systems, particularly when dealing with low-resource and highly ambiguous languages such as Arabic.
提供机构:
比尔宰特大学计算机科学系,巴勒斯坦
创建时间:
2023-02-05



