未提供具体名称
收藏arXiv2016-06-18 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/1605.04553v2
下载链接
链接失效反馈官方服务:
资源简介:
本论文讨论了基于共同性列表的语言相似性数据集的构建方法。数据集旨在通过收集人类对词汇、短语和句子相似性的判断,来测试意义表示模型的性能。数据集的构建涉及让参与者列出词汇间的共同点和差异,而非仅仅提供相似性评分,以此来避免评分模糊性。该数据集适用于评估自然语言处理系统中的相似性组件,如词义消歧、信息检索和机器翻译等。
This paper discusses the construction methodology for a language similarity dataset based on commonality lists. This dataset aims to test the performance of meaning representation models by collecting human judgments of similarity among words, phrases and sentences. The dataset construction process involves asking participants to list both the commonalities and differences between word pairs, rather than only providing similarity scores, so as to avoid scoring ambiguity. This dataset is suitable for evaluating similarity-related components in natural language processing (NLP) systems, such as word sense disambiguation, information retrieval and machine translation.
提供机构:
伦敦玛丽女王大学
创建时间:
2016-05-15



