CARD-660
收藏arXiv2018-08-28 更新2024-06-21 收录
下载链接:
https://pilehvar.github.io/card-660/
下载链接
链接失效反馈官方服务:
资源简介:
CARD-660是由剑桥大学应用语言学与理论语言学系创建的一个专家标注词相似性数据集,专门用于评估罕见词表示模型。该数据集包含660个罕见词,这些词对是从多个领域精心挑选而来,确保了数据集的多样性和挑战性。创建过程中,每个词对都由8位专家进行评分,并通过最终裁决解决分歧,确保了数据集的高质量和可靠性。CARD-660的应用领域主要集中在自然语言处理中罕见词和子词表示模型的评估,旨在解决罕见词在语义理解中的表示问题。
CARD-660 is an expert-annotated word similarity dataset developed by the Department of Applied Linguistics and Theoretical Linguistics, University of Cambridge, specifically designed for evaluating rare word representation models. This dataset contains 660 rare words, and its word pairs are meticulously selected across multiple domains to ensure the dataset's diversity and challenging nature. During the creation process, each word pair was rated by eight experts, and disagreements were resolved through final adjudication to guarantee the high quality and reliability of the dataset. The primary application scope of CARD-660 focuses on evaluating rare word and subword representation models in natural language processing, aiming to address the representation issue of rare words in semantic understanding.
提供机构:
剑桥大学应用语言学与理论语言学系
创建时间:
2018-08-28
搜集汇总
数据集介绍

背景与挑战
背景概述
CARD-660是剑桥罕见词数据集,用于评估子词和罕见词表示技术,具有高互评一致性(约0.90)和广泛领域覆盖(如IT、技术、俚语、医学等)。该数据集作为可靠基准,挑战性强,为未来研究提供了潜力,因为当前最先进技术与人类标注一致性之间存在显著差距。
以上内容由遇见数据集搜集并总结生成



