five

hamza-amin/urdu-spam-dataset

收藏
Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/hamza-amin/urdu-spam-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: ur tags: - text-classification - spam-detection - urdu - nlp license: mit task_categories: - text-classification task_ids: - intent-classification pretty_name: Urdu Spam Detection Dataset size_categories: - 1K<n<10K --- # Urdu Spam Detection Dataset ## Description This dataset is designed for classifying Urdu text into: - **0 → Not Spam** - **1 → Spam** It is intended for AI-powered emergency helpline systems (e.g., 1122/911) to filter prank or irrelevant calls. --- ## Dataset Structure **Format:** CSV | Column | Type | Description | |--------|------|-------------| | text | string | Urdu sentence | | label | int (0/1) | Spam classification | --- ## Example ``` text,label آپ کو میں نے پہلے بھی کال کیا تھا کیا یاد ہے,1 یہ ایک ایمرجنسی ہے پلیز ایمبولینس بھیجیں,0 ```` --- ## Use Cases - Emergency call filtering - Real-time spam detection - Urdu NLP classification tasks --- ## Notes - Data is synthetically generated - Cleaned and deduplicated - Balanced between spam and non-spam --- ## Loading Dataset ```python from datasets import load_dataset dataset = load_dataset("hamza-amin/urdu-spam-dataset") ```` --- ## License MIT
提供机构:
hamza-amin
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作