AraTox: A Multi-Dialect, Multi-Label Arabic Dataset for Toxicity Detection
收藏DataCite Commons2026-04-22 更新2026-05-04 收录
下载链接:
https://data.mendeley.com/datasets/zj6rsbpd6x
下载链接
链接失效反馈官方服务:
资源简介:
AraTox is a multi-dialect, multi-label Arabic dataset for toxicity detection. It contains annotated Arabic text representing multiple varieties, including Gulf, Levantine, Nile Basin, North African, Yemeni, and Modern Standard Arabic (MSA).
The dataset is designed to support research in toxic language detection, multi-label classification, and the benchmarking of Arabic NLP models across dialects.
This dataset accompanies the following publication:
Aratox: A multi-dialect, multi-label Arabic dataset and model benchmark for toxicity detection
If you use this dataset, please cite:
Alshargi, F., Abulohoom, A., Yagi, S., Jabr, F., Lulu, L., & Elnagar, A. (2026).
Aratox: A multi-dialect, multi-label Arabic dataset and model benchmark for toxicity detection.
Language Resources & Evaluation, 60, 39.
https://doi.org/10.1007/s10579-026-09917-9
提供机构:
Mendeley Data
创建时间:
2026-04-22



