ADIMA

Name: ADIMA
Creator: ShareChat
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://github.com/sharechatai/adima

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集名为ADIMA，包含了来自10种印度语言的现实对话中的11,775个音频剪辑，这些剪辑被标注为侮辱性或非侮辱性，用于二分类任务。数据集分布均衡，有5,108个侮辱性样本和6,667个非侮辱性样本，来自6,446位独特用户。此外，该数据集支持不同射击规模的少样本学习实验。规模上，总共有11,775个音频剪辑，任务是对音频剪辑进行侮辱性或非侮辱性的二分类。

This dataset, named ADIMA, contains 11,775 audio clips from real-world conversations across 10 Indian languages, with each clip annotated as either insulting or non-insulting for binary classification tasks. The dataset features a balanced distribution, consisting of 5,108 insulting samples and 6,667 non-insulting samples sourced from 6,446 unique users. Additionally, this dataset supports few-shot learning experiments with varying shot sizes. In terms of scale, there are a total of 11,775 audio clips, and the core task is to perform binary classification of these audio clips as insulting or non-insulting.

提供机构：

ShareChat

5,000+

优质数据集

54 个

任务类型

进入经典数据集