five

arkamath2026/sms_spam

收藏
Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/arkamath2026/sms_spam
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - crowdsourced - found language_creators: - crowdsourced - found language: - en license: - unknown multilinguality: - monolingual size_categories: - 1K<n<10K source_datasets: - extended|other-nus-sms-corpus task_categories: - text-classification task_ids: - intent-classification paperswithcode_id: sms-spam-collection-data-set pretty_name: SMS Spam Collection Data Set dataset_info: config_name: plain_text features: - name: sms dtype: string - name: label dtype: class_label: names: '0': ham '1': spam splits: - name: train num_bytes: 521752 num_examples: 5574 download_size: 358869 dataset_size: 521752 configs: - config_name: plain_text data_files: - split: train path: plain_text/train-* default: true train-eval-index: - config: plain_text task: text-classification task_id: binary_classification splits: train_split: train col_mapping: sms: text label: target metrics: - type: accuracy name: Accuracy - type: f1 name: F1 macro args: average: macro - type: f1 name: F1 micro args: average: micro - type: f1 name: F1 weighted args: average: weighted - type: precision name: Precision macro args: average: macro - type: precision name: Precision micro args: average: micro - type: precision name: Precision weighted args: average: weighted - type: recall name: Recall macro args: average: macro - type: recall name: Recall micro args: average: micro - type: recall name: Recall weighted args: average: weighted --- # Dataset Card for [Dataset Name] ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** http://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection - **Repository:** - **Paper:** Almeida, T.A., Gomez Hidalgo, J.M., Yamakami, A. Contributions to the study of SMS Spam Filtering: New Collection and Results. Proceedings of the 2011 ACM Symposium on Document Engineering (ACM DOCENG'11), Mountain View, CA, USA, 2011. - **Leaderboard:** - **Point of Contact:** ### Dataset Summary The SMS Spam Collection v.1 is a public set of SMS labeled messages that have been collected for mobile phone spam research. It has one collection composed by 5,574 English, real and non-enconded messages, tagged according being legitimate (ham) or spam. ### Supported Tasks and Leaderboards [More Information Needed] ### Languages English ## Dataset Structure ### Data Instances [More Information Needed] ### Data Fields - sms: the sms message - label: indicating if the sms message is ham or spam, ham means it is not spam ### Data Splits [More Information Needed] ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information @inproceedings{Almeida2011SpamFiltering, title={Contributions to the Study of SMS Spam Filtering: New Collection and Results}, author={Tiago A. Almeida and Jose Maria Gomez Hidalgo and Akebo Yamakami}, year={2011}, booktitle = "Proceedings of the 2011 ACM Symposium on Document Engineering (DOCENG'11)", } ### Contributions Thanks to [@czabo](https://github.com/czabo) for adding this dataset.

annotations_creators: - 众包(crowdsourced) - 公开采集(found) language_creators: - 众包(crowdsourced) - 公开采集(found) language: - 英语(en) license: - 未知 multilinguality: - 单语言(monolingual) size_categories: - 1000 < n < 10000 source_datasets: - 扩展|其他-nus-sms语料库(nus-sms-corpus) task_categories: - 文本分类(text-classification) task_ids: - 意图分类(intent-classification) paperswithcode_id: sms-spam-collection-data-set pretty_name: SMS垃圾短信收集数据集(SMS Spam Collection Data Set) dataset_info: config_name: 纯文本(plain_text) features: - name: sms dtype: 字符串(string) - name: label dtype: class_label: names: '0': 正常短信(ham) '1': 垃圾短信(spam) splits: - name: 训练集(train) num_bytes: 521752 num_examples: 5574 download_size: 358869 dataset_size: 521752 configs: - config_name: 纯文本(plain_text) data_files: - split: 训练集(train) path: plain_text/train-* default: true train-eval-index: - config: 纯文本(plain_text) task: 文本分类(text-classification) task_id: 二元分类(binary_classification) splits: train_split: 训练集(train) col_mapping: sms: 文本特征(text) label: 目标标签(target) metrics: - type: 准确率(accuracy) name: 准确率 - type: F1值(f1) name: 宏平均F1值 args: average: macro - type: F1值(f1) name: 微平均F1值 args: average: micro - type: F1值(f1) name: 加权平均F1值 args: average: weighted - type: 精确率(precision) name: 宏平均精确率 args: average: macro - type: 精确率(precision) name: 微平均精确率 args: average: micro - type: 精确率(precision) name: 加权平均精确率 args: average: weighted - type: 召回率(recall) name: 宏平均召回率 args: average: macro - type: 召回率(recall) name: 微平均召回率 args: average: micro - type: 召回率(recall) name: 加权平均召回率 args: average: weighted --- # [数据集名称]数据集卡片 ## 目录 - [数据集概述](#dataset-description) - [数据集摘要](#dataset-summary) - [支持任务与排行榜](#supported-tasks-and-leaderboards) - [语言](#languages) - [数据集结构](#dataset-structure) - [数据实例](#data-instances) - [数据字段](#data-fields) - [数据划分](#data-splits) - [数据集构建](#dataset-creation) - [构建初衷](#curation-rationale) - [源数据](#source-data) - [注释](#annotations) - [个人与敏感信息](#personal-and-sensitive-information) - [数据使用注意事项](#considerations-for-using-the-data) - [数据集的社会影响](#social-impact-of-dataset) - [偏差讨论](#discussion-of-biases) - [其他已知局限性](#other-known-limitations) - [附加信息](#additional-information) - [数据集维护者](#dataset-curators) - [许可信息](#licensing-information) - [引用信息](#citation-information) - [贡献](#contributions) ## 数据集概述 - **主页**:http://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection - **代码仓库**: - **相关论文**:Almeida, T.A., Gomez Hidalgo, J.M., Yamakami, A. 短信垃圾过滤研究的新数据集与成果贡献 // 2011年ACM文档工程研讨会论文集(ACM DOCENG'11),美国加利福尼亚州山景城,2011年。 - **排行榜**: - **联系人**: ### 数据集摘要 SMS垃圾短信收集数据集v1是面向手机垃圾短信研究公开采集的带标注短信消息集合。该集合包含5574条真实的、未经过编码的英语短信,根据是否为垃圾短信被标记为正常(ham)或垃圾(spam)两类。 ### 支持任务与排行榜 [需补充更多信息] ### 语言 英语 ## 数据集结构 ### 数据实例 [需补充更多信息] ### 数据字段 - sms:短信消息内容 - label:用于标识短信是否为垃圾短信的标签,ham代表非垃圾短信 ### 数据划分 [需补充更多信息] ## 数据集构建 ### 构建初衷 [需补充更多信息] ### 源数据 #### 初始数据采集与标准化 [需补充更多信息] #### 源语言生产者是谁? [需补充更多信息] ### 注释 #### 注释流程 [需补充更多信息] #### 注释者是谁? [需补充更多信息] ### 个人与敏感信息 [需补充更多信息] ## 数据使用注意事项 ### 数据集的社会影响 [需补充更多信息] ### 偏差讨论 [需补充更多信息] ### 其他已知局限性 [需补充更多信息] ## 附加信息 ### 数据集维护者 [需补充更多信息] ### 许可信息 [需补充更多信息] ### 引用信息 bibtex @inproceedings{Almeida2011SpamFiltering, title={Contributions to the Study of SMS Spam Filtering: New Collection and Results}, author={Tiago A. Almeida and Jose Maria Gomez Hidalgo and Akebo Yamakami}, year={2011}, booktitle = "Proceedings of the 2011 ACM Symposium on Document Engineering (DOCENG'11)", } ### 贡献 感谢[@czabo](https://github.com/czabo) 为本数据集的添加工作。
提供机构:
arkamath2026
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作