hamza-amin/urdu-spam-dataset
收藏Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/hamza-amin/urdu-spam-dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
language: ur
tags:
- text-classification
- spam-detection
- urdu
- nlp
license: mit
task_categories:
- text-classification
task_ids:
- intent-classification
pretty_name: Urdu Spam Detection Dataset
size_categories:
- 1K<n<10K
---
# Urdu Spam Detection Dataset
## Description
This dataset is designed for classifying Urdu text into:
- **0 → Not Spam**
- **1 → Spam**
It is intended for AI-powered emergency helpline systems (e.g., 1122/911) to filter prank or irrelevant calls.
---
## Dataset Structure
**Format:** CSV
| Column | Type | Description |
|--------|------|-------------|
| text | string | Urdu sentence |
| label | int (0/1) | Spam classification |
---
## Example
```
text,label
آپ کو میں نے پہلے بھی کال کیا تھا کیا یاد ہے,1
یہ ایک ایمرجنسی ہے پلیز ایمبولینس بھیجیں,0
````
---
## Use Cases
- Emergency call filtering
- Real-time spam detection
- Urdu NLP classification tasks
---
## Notes
- Data is synthetically generated
- Cleaned and deduplicated
- Balanced between spam and non-spam
---
## Loading Dataset
```python
from datasets import load_dataset
dataset = load_dataset("hamza-amin/urdu-spam-dataset")
````
---
## License
MIT
提供机构:
hamza-amin



