Houdna-khilouf/Dz-Emotion
收藏Hugging Face2026-04-17 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Houdna-khilouf/Dz-Emotion
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- ar
pretty_name: Dz-Emotion
license: cc-by-4.0
task_categories:
- text-classification
task_ids:
- multi-class-classification
multilinguality: monolingual
size_categories:
- 1K<n<10K
tags:
- arabic
- algerian-arabic
- emotion-classification
- nlp
---
# Dz-Emotion: Algerian Dialect Dataset for Emotion Detection
## 📌 Dataset Description
**Dz-Emotion** is the first large-scale, manually annotated dataset for **emotion detection in Algerian Arabic dialect (Darija)**. The dataset consists of **6,000 social media comments**, collected from YouTube, Facebook, and Instagram, and labeled according to **Ekman’s six basic emotions**:(Anger, Sadness, Fear, Disgust, Happiness, Surprise)
The dataset is designed to support research in **Natural Language Processing (NLP)** for low-resource dialects, especially Algerian Arabic.
---
## 📄 Paper
For more information, please visit our paper:
👉 https://ieeexplore.ieee.org/document/11472633
---
## 🤖 Related Model
This dataset was used to train:
- Dz-EmoBERT
https://huggingface.co/Houdna-khilouf/Dz-EmoBERT
**Dz-EmoBERT** is a fine-tuned transformer model for emotion detection in Algerian dialect text, achieving **94.08% accuracy** on this dataset.
---
## 📊 Dataset Structure
The dataset is provided as a CSV file with the following columns:
| Column | Description |
|--------|------------|
| ID | Unique identifier for each comment |
| Text | The comment text (Algerian dialect) |
| Label | Emotion label |
| Source | Platform source (YouTube, Facebook, Instagram) |
---
## 📈 Dataset Statistics
- Total samples: **6,000**
- Classes: **6 emotions**
- Samples per class: **1,000 (balanced)**
### Emotion Distribution:
- Anger: 1000
- Sadness: 1000
- Fear: 1000
- Disgust: 1000
- Happiness: 1000
- Surprise: 1000
### Data Sources:
- YouTube: 53%
- Instagram: 29%
- Facebook: 18%
### Train/Test Split:
- Train: 80% (4,800 samples)
- Validation: 20% (1,200 samples)
---
## 🚀 Baseline Results
The dataset was used to fine-tune several models:
| Model | Accuracy |
|-------------|---------|
| ARBERT | 86.00% |
| MARBERT | 91.67% |
| Dz-EmoBERT | **94.08%** |
---
## ⚠️ Limitations
- Data collected from social media may include noise and bias
- Focused only on **six emotions (Ekman model)**
- Limited to **Algerian dialect**
---
## 📩 Contact
For any questions or collaboration opportunities:
h.khilouf@univ-eltarf.dz
---
## 📚 Citation
If you use this dataset, please cite:
```bibtex
@inproceedings{khilouf2025dzemotion,
title={Dz-Emotion: An Algerian Dialect Dataset for Text-Based Emotion Detection},
author={Khilouf, Houdna and Ziani, Amel and Malek, Nada Ahmed and Schwab, Didier and Yakoubi, Mohamed Amine},
booktitle={2025 International Conference on Recent Advances in Mathematics and Informatics (ICRAMI)},
pages={1--6},
year={2025},
address={Sousse, Tunisia},
doi={10.1109/ICRAMI64946.2025.11472633}
}
提供机构:
Houdna-khilouf



