five

tanaos/synthetic-spam-detection-dataset-italian

收藏
Hugging Face2026-03-28 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/tanaos/synthetic-spam-detection-dataset-italian
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - it license: mit tags: - spam-detection - text-classification - content-moderation - synthetic-data - tanaos pretty_name: tanaos-spam-detection-italian Training Dataset task_categories: - text-classification task_ids: - acceptability-classification size_categories: - 10K<n<20K --- <p align="center"> <img src="https://raw.githubusercontent.com/tanaos/.github/master/assets/logo.png" width="250px" alt="Tanaos – Train task specific LLMs without training data, for offline NLP and Text Classification"> </p> # Tanaos Spam Detection Italian Training Dataset This dataset was created synthetically by Tanaos with the [Artifex](https://github.com/tanaos/artifex) Python library. The dataset is designed to **train and evaluate spam detection systems** — models that detect, classify, or filter unsolicited commercial advertisement, fraudulent messages, or other unwanted content in text form — in Italian. Our Italian spam detection model, [tanaos-spam-detection-italian](https://huggingface.co/tanaos/tanaos-spam-detection-italian), was trained on this dataset. ## Dataset Summary The dataset contains text samples labeled as either `0` (`not_spam`) or `1` (`spam`). The following categories are considered spam: 1. Unsolicited commercial advertisement or non-commercial proselytizing. 2. Fraudulent schemes. including get-rich-quick and pyramid schemes. 3. Phishing attempts. unrealistic offers or announcements. 4. Content with deceptive or misleading information. 5. Malware or harmful links. 6. Adult content or explicit material. 7. Excessive use of capitalization or punctuation to grab attention. --- ## How to Use ```python from datasets import load_dataset dataset = load_dataset("tanaos/synthetic-spam-detection-dataset-italian") print(dataset["train"][0]) ``` ## Intended Use This dataset is intended for training and evaluating spam detection models. Common use cases: - Training machine learning models to classify text messages as spam or not spam. - Evaluating the performance of spam detection algorithms. - Fine-tuning pre-trained language models for spam detection tasks.
提供机构:
tanaos
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作