tanaos/synthetic-spam-detection-dataset-italian
收藏Hugging Face2026-03-28 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/tanaos/synthetic-spam-detection-dataset-italian
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- it
license: mit
tags:
- spam-detection
- text-classification
- content-moderation
- synthetic-data
- tanaos
pretty_name: tanaos-spam-detection-italian Training Dataset
task_categories:
- text-classification
task_ids:
- acceptability-classification
size_categories:
- 10K<n<20K
---
<p align="center">
<img src="https://raw.githubusercontent.com/tanaos/.github/master/assets/logo.png" width="250px" alt="Tanaos – Train task specific LLMs without training data, for offline NLP and Text Classification">
</p>
# Tanaos Spam Detection Italian Training Dataset
This dataset was created synthetically by Tanaos with the [Artifex](https://github.com/tanaos/artifex) Python library.
The dataset is designed to **train and evaluate spam detection systems** — models that detect, classify, or filter unsolicited commercial advertisement, fraudulent messages, or other unwanted content in text form — in Italian.
Our Italian spam detection model, [tanaos-spam-detection-italian](https://huggingface.co/tanaos/tanaos-spam-detection-italian), was trained on this dataset.
## Dataset Summary
The dataset contains text samples labeled as either `0` (`not_spam`) or `1` (`spam`).
The following categories are considered spam:
1. Unsolicited commercial advertisement or non-commercial proselytizing.
2. Fraudulent schemes. including get-rich-quick and pyramid schemes.
3. Phishing attempts. unrealistic offers or announcements.
4. Content with deceptive or misleading information.
5. Malware or harmful links.
6. Adult content or explicit material.
7. Excessive use of capitalization or punctuation to grab attention.
---
## How to Use
```python
from datasets import load_dataset
dataset = load_dataset("tanaos/synthetic-spam-detection-dataset-italian")
print(dataset["train"][0])
```
## Intended Use
This dataset is intended for training and evaluating spam detection models.
Common use cases:
- Training machine learning models to classify text messages as spam or not spam.
- Evaluating the performance of spam detection algorithms.
- Fine-tuning pre-trained language models for spam detection tasks.
提供机构:
tanaos



