TEET! Tunisian Dataset for Toxic Speech Detection
收藏arXiv2021-10-11 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2110.05287v1
下载链接
链接失效反馈官方服务:
资源简介:
TEET! Tunisian Dataset for Toxic Speech Detection是由地中海技术研究所创建的一个包含约10,000条评论的标注数据集。该数据集主要用于检测Tunisian方言中的有害和辱骂性内容,该方言混合了多种语言,如MSA、Tamazight、意大利语和法语。数据集的创建过程涉及从社交媒体平台收集评论,并使用特定的关键词进行标注。该数据集的应用领域主要集中在自动检测和限制社交媒体中的有害内容,以提高处理此类问题的效率。
TEET! Tunisian Dataset for Toxic Speech Detection is a labeled dataset comprising approximately 10,000 comments developed by the Mediterranean Institute of Technology. This dataset is primarily designed for detecting toxic and abusive content in Tunisian dialect, which is a mixed linguistic variety incorporating Modern Standard Arabic (MSA), Tamazight, Italian, and French. The dataset construction process involved collecting comments from social media platforms and annotating the corpus using specific keyword-based criteria. Its core application scenarios focus on automatically detecting and moderating harmful content on social media, so as to enhance the efficiency of addressing such issues.
提供机构:
地中海技术研究所
创建时间:
2021-10-11



