Hindi Constraint Dataset
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/shikhar00778/constraint21
下载链接
链接失效反馈官方服务:
资源简介:
该数据集旨在针对社交媒体上的印地语敌意帖子进行检测,将这些帖子分为敌意和非敌意两大类,并在敌意类别下进一步细分为虚假信息、仇恨言论、攻击性内容和诽谤等子类别。尽管该数据集存在类别不平衡的问题,评估仍采用加权F1分数进行。平均评论长度约为30个标记。在规模上,训练集包含5728个样本,验证集包含811个样本。任务的目的是检测和分类敌意帖子。
This dataset is dedicated to detecting Hindi hostile posts on social media, classifying them into two main categories: hostile and non-hostile. Under the hostile category, it further subdivides into subcategories such as misinformation, hate speech, offensive content, and defamation. Despite the class imbalance issue in this dataset, the weighted F1 score is adopted for evaluation. The average length of comments is approximately 30 tokens. In terms of scale, the training set contains 5728 samples, while the validation set includes 811 samples. The task aims to detect and classify hostile social media posts.
提供机构:
AAAI



