arkamath2026/sms_spam

Name: arkamath2026/sms_spam
Creator: arkamath2026
Published: 2026-03-26 21:14:53
License: 暂无描述

Hugging Face2026-03-26 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/arkamath2026/sms_spam

下载链接

链接失效反馈

官方服务：

资源简介：

--- annotations_creators: - crowdsourced - found language_creators: - crowdsourced - found language: - en license: - unknown multilinguality: - monolingual size_categories: - 1K<n<10K source_datasets: - extended|other-nus-sms-corpus task_categories: - text-classification task_ids: - intent-classification paperswithcode_id: sms-spam-collection-data-set pretty_name: SMS Spam Collection Data Set dataset_info: config_name: plain_text features: - name: sms dtype: string - name: label dtype: class_label: names: '0': ham '1': spam splits: - name: train num_bytes: 521752 num_examples: 5574 download_size: 358869 dataset_size: 521752 configs: - config_name: plain_text data_files: - split: train path: plain_text/train-* default: true train-eval-index: - config: plain_text task: text-classification task_id: binary_classification splits: train_split: train col_mapping: sms: text label: target metrics: - type: accuracy name: Accuracy - type: f1 name: F1 macro args: average: macro - type: f1 name: F1 micro args: average: micro - type: f1 name: F1 weighted args: average: weighted - type: precision name: Precision macro args: average: macro - type: precision name: Precision micro args: average: micro - type: precision name: Precision weighted args: average: weighted - type: recall name: Recall macro args: average: macro - type: recall name: Recall micro args: average: micro - type: recall name: Recall weighted args: average: weighted --- # Dataset Card for [Dataset Name] ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** http://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection - **Repository:** - **Paper:** Almeida, T.A., Gomez Hidalgo, J.M., Yamakami, A. Contributions to the study of SMS Spam Filtering: New Collection and Results. Proceedings of the 2011 ACM Symposium on Document Engineering (ACM DOCENG'11), Mountain View, CA, USA, 2011. - **Leaderboard:** - **Point of Contact:** ### Dataset Summary The SMS Spam Collection v.1 is a public set of SMS labeled messages that have been collected for mobile phone spam research. It has one collection composed by 5,574 English, real and non-enconded messages, tagged according being legitimate (ham) or spam. ### Supported Tasks and Leaderboards [More Information Needed] ### Languages English ## Dataset Structure ### Data Instances [More Information Needed] ### Data Fields - sms: the sms message - label: indicating if the sms message is ham or spam, ham means it is not spam ### Data Splits [More Information Needed] ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information @inproceedings{Almeida2011SpamFiltering, title={Contributions to the Study of SMS Spam Filtering: New Collection and Results}, author={Tiago A. Almeida and Jose Maria Gomez Hidalgo and Akebo Yamakami}, year={2011}, booktitle = "Proceedings of the 2011 ACM Symposium on Document Engineering (DOCENG'11)", } ### Contributions Thanks to [@czabo](https://github.com/czabo) for adding this dataset.

annotations_creators: - 众包（crowdsourced） - 公开采集（found） language_creators: - 众包（crowdsourced） - 公开采集（found） language: - 英语（en） license: - 未知 multilinguality: - 单语言（monolingual） size_categories: - 1000 < n < 10000 source_datasets: - 扩展|其他-nus-sms语料库（nus-sms-corpus） task_categories: - 文本分类（text-classification） task_ids: - 意图分类（intent-classification） paperswithcode_id: sms-spam-collection-data-set pretty_name: SMS垃圾短信收集数据集（SMS Spam Collection Data Set） dataset_info: config_name: 纯文本（plain_text） features: - name: sms dtype: 字符串（string） - name: label dtype: class_label: names: '0': 正常短信（ham） '1': 垃圾短信（spam） splits: - name: 训练集（train） num_bytes: 521752 num_examples: 5574 download_size: 358869 dataset_size: 521752 configs: - config_name: 纯文本（plain_text） data_files: - split: 训练集（train） path: plain_text/train-* default: true train-eval-index: - config: 纯文本（plain_text） task: 文本分类（text-classification） task_id: 二元分类（binary_classification） splits: train_split: 训练集（train） col_mapping: sms: 文本特征（text） label: 目标标签（target） metrics: - type: 准确率（accuracy） name: 准确率 - type: F1值（f1） name: 宏平均F1值 args: average: macro - type: F1值（f1） name: 微平均F1值 args: average: micro - type: F1值（f1） name: 加权平均F1值 args: average: weighted - type: 精确率（precision） name: 宏平均精确率 args: average: macro - type: 精确率（precision） name: 微平均精确率 args: average: micro - type: 精确率（precision） name: 加权平均精确率 args: average: weighted - type: 召回率（recall） name: 宏平均召回率 args: average: macro - type: 召回率（recall） name: 微平均召回率 args: average: micro - type: 召回率（recall） name: 加权平均召回率 args: average: weighted --- # [数据集名称]数据集卡片 ## 目录 - [数据集概述](#dataset-description) - [数据集摘要](#dataset-summary) - [支持任务与排行榜](#supported-tasks-and-leaderboards) - [语言](#languages) - [数据集结构](#dataset-structure) - [数据实例](#data-instances) - [数据字段](#data-fields) - [数据划分](#data-splits) - [数据集构建](#dataset-creation) - [构建初衷](#curation-rationale) - [源数据](#source-data) - [注释](#annotations) - [个人与敏感信息](#personal-and-sensitive-information) - [数据使用注意事项](#considerations-for-using-the-data) - [数据集的社会影响](#social-impact-of-dataset) - [偏差讨论](#discussion-of-biases) - [其他已知局限性](#other-known-limitations) - [附加信息](#additional-information) - [数据集维护者](#dataset-curators) - [许可信息](#licensing-information) - [引用信息](#citation-information) - [贡献](#contributions) ## 数据集概述 - **主页**：http://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection - **代码仓库**： - **相关论文**：Almeida, T.A., Gomez Hidalgo, J.M., Yamakami, A. 短信垃圾过滤研究的新数据集与成果贡献 // 2011年ACM文档工程研讨会论文集（ACM DOCENG'11），美国加利福尼亚州山景城，2011年。 - **排行榜**： - **联系人**： ### 数据集摘要 SMS垃圾短信收集数据集v1是面向手机垃圾短信研究公开采集的带标注短信消息集合。该集合包含5574条真实的、未经过编码的英语短信，根据是否为垃圾短信被标记为正常（ham）或垃圾（spam）两类。 ### 支持任务与排行榜 [需补充更多信息] ### 语言英语 ## 数据集结构 ### 数据实例 [需补充更多信息] ### 数据字段 - sms：短信消息内容 - label：用于标识短信是否为垃圾短信的标签，ham代表非垃圾短信 ### 数据划分 [需补充更多信息] ## 数据集构建 ### 构建初衷 [需补充更多信息] ### 源数据 #### 初始数据采集与标准化 [需补充更多信息] #### 源语言生产者是谁？ [需补充更多信息] ### 注释 #### 注释流程 [需补充更多信息] #### 注释者是谁？ [需补充更多信息] ### 个人与敏感信息 [需补充更多信息] ## 数据使用注意事项 ### 数据集的社会影响 [需补充更多信息] ### 偏差讨论 [需补充更多信息] ### 其他已知局限性 [需补充更多信息] ## 附加信息 ### 数据集维护者 [需补充更多信息] ### 许可信息 [需补充更多信息] ### 引用信息 bibtex @inproceedings{Almeida2011SpamFiltering, title={Contributions to the Study of SMS Spam Filtering: New Collection and Results}, author={Tiago A. Almeida and Jose Maria Gomez Hidalgo and Akebo Yamakami}, year={2011}, booktitle = "Proceedings of the 2011 ACM Symposium on Document Engineering (DOCENG'11)", } ### 贡献感谢[@czabo](https://github.com/czabo) 为本数据集的添加工作。

提供机构：

arkamath2026

5,000+

优质数据集

54 个

任务类型

进入经典数据集