PAP900
收藏doi.org2024-11-04 更新2025-03-24 收录
下载链接:
http://doi.org/10.17632/5mhxtv8pn2.2
下载链接
链接失效反馈官方服务:
资源简介:
The PAP900 dataset centers on the semantic relationship between affective words in Portuguese. It contains 900 word pairs, each annotated by at least 30 human raters for both semantic similarity and semantic relatedness. In addition to the semantic ratings, the dataset includes the word categorization used to build the word pairs and detailed sociodemographic information about annotators, enabling the analysis of the influence of personal factors on the perception of semantic relationships. Furthermore, this article describes in detail the dataset construction process, from word selection to agreement metrics.
Data was collected from Portuguese university psychology students, who completed two rounds of questionnaires. In the first round annotators were asked to rate word pairs on either semantic similarity or relatedness. The second round switched the relation type for most annotators, with a small percentage being asked to repeat the same relation. The instructions given emphasized the differences between semantic relatedness and semantic similarity, and provided examples of expected ratings of both.
There are few semantic relation datasets in Portuguese, and none focusing on affective words. PAP900 is distributed in distinct formats to be easy to use for both researchers just looking for the final averaged values and for researchers looking to take advantage of the individual ratings, the word categorization and the annotator data.
PAP900数据集聚焦于葡萄牙语情感词汇之间的语义关系。该数据集包含900组词汇对,每组词汇对均由至少30位人类评鉴员就语义相似度和语义相关性进行标注。除了语义评分外,数据集还包含了构建词汇对所使用的词汇分类以及关于评鉴员的社会人口统计学详细信息,从而便于分析个人因素对语义关系感知的影响。此外,本文详细描述了数据集的构建过程,从词汇选择至一致性指标的计算。数据收集自葡萄牙大学的心理学学生,他们完成了两轮问卷调查。在第一轮中,评鉴员被要求对词汇对进行语义相似度或相关性的评分。在第二轮中,对于大多数评鉴员,关系类型发生了切换,而一小部分评鉴员被要求重复同一关系的评分。给出的指导强调了语义相关性和语义相似性之间的区别,并提供了两种评分预期的示例。葡萄牙语中关于语义关系的语料库相对较少,且尚未有专注于情感词汇的数据集。PAP900数据集以不同的格式进行分发,旨在方便研究人员获取最终的平均值,同时也便于研究人员利用个体评分、词汇分类以及评鉴员数据。
提供机构:
doi.org



