five

Potentially Euphemistic Terms (PETs) Corpus

收藏
arXiv2022-05-06 更新2024-06-21 收录
下载链接:
https://github.com/marsgav/euphemism_project
下载链接
链接失效反馈
官方服务:
资源简介:
本数据集名为‘可能的委婉语表达(PETs)语料库’,由蒙特克莱尔州立大学创建,包含1965条来自全球网络英语语料库(GloWbE)的句子,用于分析和识别可能的委婉语。数据集涵盖多种敏感话题,如死亡、性活动、就业等,旨在通过对比委婉语与直接表达的情感差异,解决语言处理中的委婉语识别问题。创建过程中,研究团队从多个来源收集了184个可能的委婉语,并手动标注了这些词汇在句子中的使用方式,以区分其委婉与非委婉用法。

This dataset, named the *Possible Euphemistic Expressions (PETs) Corpus*, was created by Montclair State University. It contains 1,965 sentences extracted from the Global Web-Based English Corpus (GloWbE), and is intended for the analysis and identification of potential euphemisms. The dataset covers a variety of sensitive topics such as death, sexual activity, employment, and others. Its purpose is to resolve the challenge of euphemism recognition in natural language processing by comparing the differences in emotional connotations between euphemistic expressions and direct expressions. During the creation process, the research team collected 184 potential euphemisms from multiple sources, and manually annotated the usage of these terms within the sentences to distinguish between their euphemistic and non-euphemistic uses.
提供机构:
蒙特克莱尔州立大学
创建时间:
2022-05-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作