WordSim-353(WordSimilarity-353)
收藏OpenDataLab2026-05-24 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/WordSim-353_WordSimilarity-353
下载链接
链接失效反馈官方服务:
资源简介:
第一组 (set1) 包含153个单词对以及它们由13个主题分配的相似性分数。第二组 (set2) 包含200个单词对,它们的相似性由16个主题评估。受试者的名字已被序数 (1 .. 13或1..16) 代替,以保护他们的隐私; 两组中的相同数字不一定对应于同一个人。
两个实验中的所有受试者都具有近乎母语的英语水平。他们的指示是以0 (完全不相关的单词) 到10 (非常相关或相同的单词) 的比例来估计成对单词的相关性。ZIP存档内的文件instructions.txt中提供了精确的说明 (请参阅下面的 “可用性和用法” 一节)。
每个集合提供每个主题分配的原始分数,以及每个单词对的平均分数。为了方便起见,提供包含所有353单词的列表以及它们的平均相似性分数的组合集合 (组合)。组合集合仅仅是两个较小集合的串联。
The first set (set1) contains 153 word pairs along with their similarity scores assigned by 13 raters. The second set (set2) contains 200 word pairs, whose similarity was evaluated by 16 raters. The names of the raters have been replaced with ordinal numbers (1 .. 13 or 1 .. 16) to protect their privacy; the same numerical value across the two sets does not necessarily correspond to the same individual.
All raters in both experiments possessed near-native English proficiency. They were instructed to estimate the semantic relatedness of paired words on a scale from 0 (completely unrelated words) to 10 (highly related or identical words). Explicit instructions are provided in the file instructions.txt within the ZIP archive (please refer to the "Availability and Usage" section below).
Each set provides the raw scores assigned by each rater, as well as the average score for each individual word pair. For convenience, a combined set (Combined) containing the list of all 353 word pairs and their average similarity scores is provided. The combined set is merely the concatenation of the two smaller sets.
提供机构:
OpenDataLab
创建时间:
2023-03-30
搜集汇总
数据集介绍

背景与挑战
背景概述
WordSim-353数据集包含353个英语单词对,分为两组:第一组153对由13人评估,第二组200对由16人评估,参与者以0-10分评级单词相似性。该数据集由以色列理工学院于2002年发布,提供原始评分和平均分数,用于语义相似性研究。
以上内容由遇见数据集搜集并总结生成



