five

Houdna-khilouf/Dz-Emotion

收藏
Hugging Face2026-04-17 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Houdna-khilouf/Dz-Emotion
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - ar pretty_name: Dz-Emotion license: cc-by-4.0 task_categories: - text-classification task_ids: - multi-class-classification multilinguality: monolingual size_categories: - 1K<n<10K tags: - arabic - algerian-arabic - emotion-classification - nlp --- # Dz-Emotion: Algerian Dialect Dataset for Emotion Detection ## 📌 Dataset Description **Dz-Emotion** is the first large-scale, manually annotated dataset for **emotion detection in Algerian Arabic dialect (Darija)**. The dataset consists of **6,000 social media comments**, collected from YouTube, Facebook, and Instagram, and labeled according to **Ekman’s six basic emotions**:(Anger, Sadness, Fear, Disgust, Happiness, Surprise) The dataset is designed to support research in **Natural Language Processing (NLP)** for low-resource dialects, especially Algerian Arabic. --- ## 📄 Paper For more information, please visit our paper: 👉 https://ieeexplore.ieee.org/document/11472633 --- ## 🤖 Related Model This dataset was used to train: - Dz-EmoBERT https://huggingface.co/Houdna-khilouf/Dz-EmoBERT **Dz-EmoBERT** is a fine-tuned transformer model for emotion detection in Algerian dialect text, achieving **94.08% accuracy** on this dataset. --- ## 📊 Dataset Structure The dataset is provided as a CSV file with the following columns: | Column | Description | |--------|------------| | ID | Unique identifier for each comment | | Text | The comment text (Algerian dialect) | | Label | Emotion label | | Source | Platform source (YouTube, Facebook, Instagram) | --- ## 📈 Dataset Statistics - Total samples: **6,000** - Classes: **6 emotions** - Samples per class: **1,000 (balanced)** ### Emotion Distribution: - Anger: 1000 - Sadness: 1000 - Fear: 1000 - Disgust: 1000 - Happiness: 1000 - Surprise: 1000 ### Data Sources: - YouTube: 53% - Instagram: 29% - Facebook: 18% ### Train/Test Split: - Train: 80% (4,800 samples) - Validation: 20% (1,200 samples) --- ## 🚀 Baseline Results The dataset was used to fine-tune several models: | Model | Accuracy | |-------------|---------| | ARBERT | 86.00% | | MARBERT | 91.67% | | Dz-EmoBERT | **94.08%** | --- ## ⚠️ Limitations - Data collected from social media may include noise and bias - Focused only on **six emotions (Ekman model)** - Limited to **Algerian dialect** --- ## 📩 Contact For any questions or collaboration opportunities: h.khilouf@univ-eltarf.dz --- ## 📚 Citation If you use this dataset, please cite: ```bibtex @inproceedings{khilouf2025dzemotion, title={Dz-Emotion: An Algerian Dialect Dataset for Text-Based Emotion Detection}, author={Khilouf, Houdna and Ziani, Amel and Malek, Nada Ahmed and Schwab, Didier and Yakoubi, Mohamed Amine}, booktitle={2025 International Conference on Recent Advances in Mathematics and Informatics (ICRAMI)}, pages={1--6}, year={2025}, address={Sousse, Tunisia}, doi={10.1109/ICRAMI64946.2025.11472633} }
提供机构:
Houdna-khilouf
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作