five

AustroTox

收藏
arXiv2024-06-12 更新2024-06-14 收录
下载链接:
https://www.pia.wien/austrotox/, https://web.ds-ifs.tuwien.ac.at/austrotox/
下载链接
链接失效反馈
官方服务:
资源简介:
AustroTox是由维也纳技术大学创建的一个专注于奥地利德语方言的冒犯性语言检测数据集,包含4,562条来自新闻论坛的用户评论。该数据集不仅进行二元冒犯性分类,还识别了构成粗俗语言或冒犯性陈述目标的评论片段。创建过程中,数据集从DerStandard报纸的讨论论坛中抽取评论,并通过专业和学术背景的标注者进行标注。AustroTox的应用领域主要集中在提高语言模型在特定文化和语言背景下的冒犯性检测能力,旨在解决内容审核中的个性化和地域化需求。

AustroTox is an offensive language detection dataset focused on Austrian German dialects, developed by the Vienna University of Technology. It contains 4,562 user comments collected from news forums. Beyond binary offensive classification, the dataset also identifies comment segments that serve as the targets of vulgar language or offensive statements. During the dataset construction phase, comments were extracted from the discussion forums of the DerStandard newspaper and annotated by annotators with professional and academic backgrounds. The primary application scenarios of AustroTox focus on improving the offensive language detection performance of language models in specific cultural and linguistic contexts, with the goal of addressing the personalized and regionalized requirements in content moderation.
提供机构:
维也纳技术大学
创建时间:
2024-06-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作