five

Bangalabarta : A Spam / Smishing SMS Dataset Bangla

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/jfkfbw3gzh
下载链接
链接失效反馈
官方服务:
资源简介:
Description: BangalaBarta is a robust and diverse dataset designed for the detection and classification of spam and smishing (phishing via SMS) messages in Bangla. It contains a total of 2772 SMS messages categorized into three distinct classes: Smishing, Promotional, and Normal SMS. The dataset represents a wide range of text types encountered in Bangla short message services (SMS) across various telecommunication networks, including prominent Bangladeshi telecom operators such as Grameenphone, Banglalink, and Robi, among others. This dataset has been carefully curated to offer a representative sample of common SMS messages exchanged among users in Bangladesh, making it particularly useful for training and evaluating machine learning models aimed at spam and smishing detection. The Smishing class contains messages designed to deceive users into revealing sensitive information, while the Promotional class includes marketing messages from various businesses. The Normal SMS class represents everyday communication between users that are not intended to be malicious or promotional. Key Features: Total messages: 2772 Classes: Smishing, Promotional, Normal SMS Languages: Bangla (Bengali) Telecom Networks Covered: Grameenphone, Banglalink, Robi, and other major telecom services Use Cases: Spam detection, smishing identification, language-based classification models Format: The dataset is available in a structured format (e.g., CSV, JSON) with clear labeling for each message type. Potential Applications: Spam Detection: Identifying unwanted marketing messages from legitimate user communications. Smishing Detection: Classifying fraudulent SMS attempting to steal personal or financial information. Language Processing: Facilitating the development of Bangla language models for text classification. Telecom Security: Enhancing telecom service providers' ability to identify and block malicious SMS traffic. This dataset is ideal for researchers and practitioners working on Bangla language processing, telecom security, and natural language processing (NLP), particularly in contexts where identifying harmful SMS is crucial for ensuring user safety and maintaining secure mobile communication networks.
创建时间:
2025-02-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作