five

ADIMA

收藏
arXiv2025-09-30 收录
下载链接:
https://github.com/sharechatai/adima
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集名为ADIMA,包含了来自10种印度语言的现实对话中的11,775个音频剪辑,这些剪辑被标注为侮辱性或非侮辱性,用于二分类任务。数据集分布均衡,有5,108个侮辱性样本和6,667个非侮辱性样本,来自6,446位独特用户。此外,该数据集支持不同射击规模的少样本学习实验。规模上,总共有11,775个音频剪辑,任务是对音频剪辑进行侮辱性或非侮辱性的二分类。

This dataset, named ADIMA, contains 11,775 audio clips from real-world conversations across 10 Indian languages, with each clip annotated as either insulting or non-insulting for binary classification tasks. The dataset features a balanced distribution, consisting of 5,108 insulting samples and 6,667 non-insulting samples sourced from 6,446 unique users. Additionally, this dataset supports few-shot learning experiments with varying shot sizes. In terms of scale, there are a total of 11,775 audio clips, and the core task is to perform binary classification of these audio clips as insulting or non-insulting.
提供机构:
ShareChat
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作