The ComMA Dataset v0.2 Dataset

Name: The ComMA Dataset v0.2 Dataset
Creator: Papers with Code
License: 暂无描述

paperswithcode.com2025-01-21 收录

下载链接：

https://paperswithcode.com/dataset/the-comma-dataset-v0-2

下载链接

链接失效反馈

官方服务：

资源简介：

The ComMA Dataset v0.2 is a multilingual dataset annotated with a hierarchical, fine-grained tagset marking different types of aggression and the "context" in which they occur. The context, here, is defined by the conversational thread in which a specific comment occurs and also the "type" of discursive role that the comment is performing with respect to the previous comment. The initial dataset, being discussed here (and made available as part of the ComMA@ICON shared task), consists of a total 15,000 annotated comments in four languages - Meitei, Bangla, Hindi, and Indian English - collected from various social media platforms such as YouTube, Facebook, Twitter and Telegram. As is usual on social media websites, a large number of these comments are multilingual, mostly code-mixed with English.

ComMA 数据集 v0.2 是一个多语言数据集，其注释采用了层次化的精细粒度标签集，用于标记不同类型的攻击行为及其发生的“语境”。在此，‘语境’被定义为特定评论发生的对话线程，以及该评论相对于前一条评论所扮演的“话语角色”的类型。本讨论的初始数据集（作为 ComMA@ICON 共享任务的一部分提供）包含总计 15,000 条注释，涉及四种语言——梅泰语、孟加拉语、印地语和印度英语——这些数据来自 YouTube、Facebook、Twitter 和 Telegram 等多种社交媒体平台。与社交媒体网站上的常见情况一致，其中大量评论为多语言，多数为英语混合语。

提供机构：

Papers with Code

5,000+

优质数据集

54 个

任务类型

进入经典数据集