five

M-Phasis corpus

收藏
arXiv2022-04-28 更新2024-06-21 收录
下载链接:
https://github.com/uds-lsv/mphasis
下载链接
链接失效反馈
官方服务:
资源简介:
M-Phasis corpus是由萨尔兰大学口语语言系统组创建的一个包含约9000条德语和法语用户评论的数据集,这些评论来自与移民相关的新闻文章。该数据集超越了简单的仇恨与中性二分法,通过23个特征来描述从批评性评论到隐式和显式仇恨表达的各种言论类型。数据集的创建过程涉及从多个主流和边缘媒体平台收集评论,并由每种语言的四名母语者进行标注,以确保高水平的标注一致性。M-Phasis corpus的应用领域包括对网络仇恨言论的动态和特征进行深入研究,以及开发相应的对策措施。

The M-Phasis corpus, developed by the Spoken Language Systems Group at Saarland University, is a dataset containing approximately 9,000 German and French user comments sourced from news articles related to immigration. This dataset moves beyond the simplistic hate-neutral binary, as it characterizes a full spectrum of speech types ranging from critical comments to both implicit and explicit hate expressions through 23 distinct features. The construction of the corpus involved collecting comments from a variety of mainstream and fringe media platforms, with annotation performed by four native speakers for each language to guarantee a high level of inter-annotator agreement. Application scenarios of the M-Phasis corpus include in-depth studies on the dynamics and characteristics of online hate speech, as well as the development of corresponding countermeasures.
提供机构:
萨尔兰大学口语语言系统组
创建时间:
2022-04-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作