five

Indonesian Toxic Speech Dataset (IndoToxSpeech)

收藏
IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/indonesian-toxic-speech-dataset-indotoxspeech
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains audio recordings and transcriptions of toxic speech derived from Indonesian conversations during YouTube videos where scammers are confronted. The dataset captures two separate interactions that escalate into toxic exchanges. Each interaction has been verified by native Indonesian speakers and labeled into two classes: toxic and non-toxic. The dataset includes both the original and preprocessed versions of the speech and text data. The original speech files total 136MB, while the preprocessed speech files are 111,7MB. Text transcriptions of the conversations are also included, with both original and preprocessed text files being 16 KB. This dataset can be utilized for research in toxic speech detection, natural language processing, and the development of machine learning models for audio and text classification.
提供机构:
Gumelar, Agustinus Bimo; Nugroho, Arif; Purnomo, Mauridhi Hery; Sugiarto, Indar; Yuniarno, Eko Mulyanto; Adi, Derry Pramono
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作