ADIMA
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/sharechatai/adima
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为ADIMA,包含了来自10种印度语言的现实对话中的11,775个音频剪辑,这些剪辑被标注为侮辱性或非侮辱性,用于二分类任务。数据集分布均衡,有5,108个侮辱性样本和6,667个非侮辱性样本,来自6,446位独特用户。此外,该数据集支持不同射击规模的少样本学习实验。规模上,总共有11,775个音频剪辑,任务是对音频剪辑进行侮辱性或非侮辱性的二分类。
This dataset, named ADIMA, contains 11,775 audio clips from real-world conversations across 10 Indian languages, with each clip annotated as either insulting or non-insulting for binary classification tasks. The dataset features a balanced distribution, consisting of 5,108 insulting samples and 6,667 non-insulting samples sourced from 6,446 unique users. Additionally, this dataset supports few-shot learning experiments with varying shot sizes. In terms of scale, there are a total of 11,775 audio clips, and the core task is to perform binary classification of these audio clips as insulting or non-insulting.
提供机构:
ShareChat



