Indonesian Toxic Speech Dataset (IndoToxSpeech)
收藏IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/indonesian-toxic-speech-dataset-indotoxspeech
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains audio recordings and transcriptions of toxic speech derived from Indonesian conversations during YouTube videos where scammers are confronted. The dataset captures two separate interactions that escalate into toxic exchanges. Each interaction has been verified by native Indonesian speakers and labeled into two classes: toxic and non-toxic. The dataset includes both the original and preprocessed versions of the speech and text data. The original speech files total 136MB, while the preprocessed speech files are 111,7MB. Text transcriptions of the conversations are also included, with both original and preprocessed text files being 16 KB. This dataset can be utilized for research in toxic speech detection, natural language processing, and the development of machine learning models for audio and text classification.
提供机构:
Gumelar, Agustinus Bimo; Nugroho, Arif; Purnomo, Mauridhi Hery; Sugiarto, Indar; Yuniarno, Eko Mulyanto; Adi, Derry Pramono



