Don't Patronize Me!
收藏arXiv2020-11-17 更新2024-06-21 收录
下载链接:
https://github.com/Perez-AlmendrosC/dontpatronizeme
下载链接
链接失效反馈官方服务:
资源简介:
Don't Patronize Me! 数据集由卡迪夫大学计算机科学与信息学院创建,旨在支持开发能够识别和分类对弱势群体(如难民、无家可归者、贫困家庭)使用的高傲或轻视语言的NLP模型。该数据集包含超过10,000个从新闻故事中提取的段落,这些段落已被标注以指示文本范围内PCL的存在。数据集内容涵盖20个不同国家的英语新闻来源,涉及多种弱势群体。创建过程中,数据由三位具有通信、媒体和数据科学背景的专家标注者进行标注,并提供了一个专注于针对弱势群体的PCL分类的分类法。该数据集的应用领域主要在于解决媒体中对弱势群体的不公平待遇问题,促进社会包容和减少不平等。
The "Don't Patronize Me!" Dataset was created by the School of Computer Science and Informatics at Cardiff University, aiming to support the development of NLP models that can identify and classify patronizing and condescending language (PCL) directed at vulnerable groups including refugees, homeless people, and low-income families. The dataset includes over 10,000 paragraphs extracted from news stories, which have been annotated to indicate the presence of PCL within the text. It covers English news sources from 20 different countries, involving a wide range of vulnerable groups. During the dataset creation process, the data was annotated by three expert annotators with backgrounds in communications, media studies, and data science, and a taxonomy focused on classifying PCL targeting vulnerable populations was provided. The main applications of this dataset are to address unfair treatment of vulnerable groups in media, promote social inclusion, and reduce inequality.
提供机构:
卡迪夫大学计算机科学与信息学院
创建时间:
2020-11-17



