five

karanverma19/Indian_Multilingual_Scam_Message_Dataset

收藏
Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/karanverma19/Indian_Multilingual_Scam_Message_Dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 --- # Indian Multilingual Scam Message Dataset ## Overview This dataset contains 120 realistic SMS and text messages from Indian contexts, labeled as scam or legitimate. It reflects real-world communication patterns across Hindi, Hinglish, and English. ## Features - 120 high-quality samples - Multilingual (Hindi, Hinglish, English) - Real-world inspired scam and legitimate messages - Includes reasoning for each label - Covers multiple domains: banking, ecommerce, telecom, utilities, finance, and government ## Dataset Structure | Column | Description | |--------|------------| | message | The SMS or text message | | label | scam or legit | | reason | Explanation for classification | | domain | Application domain (banking, ecommerce, etc.) | | language | Language used (Hindi, Hinglish, English) | ## Example | message | label | |--------|------| | आपका बैंक खाता सत्यापन लंबित है, कृपया तुरंत अपडेट करें | scam | | Your OTP is 456789. Do not share it with anyone. | legit | ## Use Cases - Scam detection systems - Spam filtering models - Fraud prevention tools - Multilingual chatbot safety layers ## Motivation Fraudulent SMS and phishing attacks are common in India, especially in multilingual and code-mixed formats. This dataset helps build AI systems that can better understand and detect such threats in real-world scenarios. ## Evaluation This dataset can be evaluated using: - Classification accuracy - Precision / Recall / F1-score - Human validation for realism and correctness ## License Apache-2.0
提供机构:
karanverma19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作