tanaos/synthetic-text-summarization-dataset-v1
收藏Hugging Face2026-03-29 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/tanaos/synthetic-text-summarization-dataset-v1
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: mit
tags:
- text-simplification
- news-articles-summarization
- text2text-generation
- seq2seq
- nlp
- synthetic-data
- tanaos
pretty_name: tanaos-text-summarization-v1 Training Dataset
task_ids:
- text-simplification
- news-articles-summarization
- text2text-generation
size_categories:
- 15K<n<30K
---
<p align="center">
<img src="https://raw.githubusercontent.com/tanaos/.github/master/assets/logo.png" width="250px" alt="Tanaos – Train task specific LLMs without training data, for offline NLP and Text Classification">
</p>
# Tanaos Text Summarization Training Dataset
This dataset was created synthetically by Tanaos with the [Artifex](https://github.com/tanaos/artifex) Python library.
The dataset is designed to **train and evaluate text summarization systems** — models that generate a concise, abstractive summary of a longer input text. It can be used to build summarization models for various applications, such as news summarization, document condensation, and content digestion.
Our flagship text summarization model, [tanaos-text-summarization-v1](https://huggingface.co/tanaos/tanaos-text-summarization-v1), was trained on this dataset.
## Dataset Summary
The dataset contains pairs of input texts and their corresponding abstractive summaries. Each sample consists of a `text` field with the source document and a `summary` field with a concise human-readable summary.
Text samples span various domains, including news articles, business announcements, scientific findings, local events, and product launches.
## How to Use
```python
from datasets import load_dataset
dataset = load_dataset("tanaos/synthetic-text-summarization-dataset-v1")
print(dataset["train"][0])
```
## Intended Use
This dataset is meant for **training, fine-tuning, and evaluating** models for general-purpose text summarization tasks.
Common use cases:
- Summarizing news articles for quick reading and content digestion.
- Condensing long business reports or documents into executive summaries.
- Building summarization pipelines for customer support ticket triage.
- Enhancing search engines and knowledge bases with auto-generated abstracts.
提供机构:
tanaos



