Uli Dataset

Name: Uli Dataset
Creator: Tattle
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://github.com/tattle-made/uli_dataset

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含了三种南亚语言（印地语、泰米尔语和印度英语）中的性别歧视性滥用内容，这些推文由自认为是女性或南亚LGBTQIA群体成员的专家进行标注。该数据集以CSV文件形式在GitHub上共享，遵循CC BY 4.0许可协议。其任务旨在实现仇恨言论和性别歧视性滥用内容的自动化检测。

This dataset contains sexist abusive content in three South Asian languages, namely Hindi, Tamil, and Indian English. The underlying tweets were annotated by experts who self-identify as women or members of the South Asian LGBTQIA community. This dataset is shared on GitHub in CSV format, licensed under CC BY 4.0. The dataset is intended for the automated detection of hate speech and sexist abusive content.

提供机构：

Tattle

5,000+

优质数据集

54 个

任务类型

进入经典数据集