rjraj111/SMS_Spam_Multilingual_Collection_Dataset
收藏Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/rjraj111/SMS_Spam_Multilingual_Collection_Dataset
下载链接
链接失效反馈官方服务:
资源简介:
SMS垃圾邮件多语言收集数据集是一个包含多种语言的SMS消息集合,每条消息都被标记为垃圾邮件或合法邮件(ham)。该数据集最初包含5,574条英文SMS消息,后来使用Facebook AI开发的M2M100_418M多语言编码器-解码器模型将其翻译成印地语、德语、法语、西班牙语、中文、阿拉伯语、孟加拉语、俄语、葡萄牙语、印尼语、乌尔都语、日语、旁遮普语、爪哇语、土耳其语、韩语、马拉地语、乌克兰语、瑞典语和挪威语。数据集内容包含多语言文本及其对应的标签,ham表示非垃圾邮件文本,spam表示垃圾邮件文本。
The SMS Spam Multilingual Collection Dataset is a set of SMS-tagged messages that have been collected for SMS Spam research. It originally contained one set of SMS messages in English of 5,574 messages, tagged as ham (legitimate) or spam and later Machine Translated into Hindi, German, French, Spanish, Chinese, Arabic, Bengali, Russian, Portuguese, Indonesian, Urdu, Japanese, Punjabi, Javanese, Turkish, Korean, Marathi, Ukrainian, Swedish, and Norwegian using the M2M100_418M model. The dataset contains multilingual text and corresponding labels, where ham denotes non-spam text and spam denotes spam text.
提供机构:
rjraj111



