tanaos/synthetic-text-summarization-dataset-v1

Name: tanaos/synthetic-text-summarization-dataset-v1
Creator: tanaos
Published: 2026-03-29 12:19:59
License: 暂无描述

Hugging Face2026-03-29 更新2026-04-05 收录

下载链接：

https://hf-mirror.com/datasets/tanaos/synthetic-text-summarization-dataset-v1

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: mit tags: - text-simplification - news-articles-summarization - text2text-generation - seq2seq - nlp - synthetic-data - tanaos pretty_name: tanaos-text-summarization-v1 Training Dataset task_ids: - text-simplification - news-articles-summarization - text2text-generation size_categories: - 15K<n<30K --- <p align="center"> <img src="https://raw.githubusercontent.com/tanaos/.github/master/assets/logo.png" width="250px" alt="Tanaos – Train task specific LLMs without training data, for offline NLP and Text Classification"> </p> # Tanaos Text Summarization Training Dataset This dataset was created synthetically by Tanaos with the [Artifex](https://github.com/tanaos/artifex) Python library. The dataset is designed to **train and evaluate text summarization systems** — models that generate a concise, abstractive summary of a longer input text. It can be used to build summarization models for various applications, such as news summarization, document condensation, and content digestion. Our flagship text summarization model, [tanaos-text-summarization-v1](https://huggingface.co/tanaos/tanaos-text-summarization-v1), was trained on this dataset. ## Dataset Summary The dataset contains pairs of input texts and their corresponding abstractive summaries. Each sample consists of a `text` field with the source document and a `summary` field with a concise human-readable summary. Text samples span various domains, including news articles, business announcements, scientific findings, local events, and product launches. ## How to Use ```python from datasets import load_dataset dataset = load_dataset("tanaos/synthetic-text-summarization-dataset-v1") print(dataset["train"][0]) ``` ## Intended Use This dataset is meant for **training, fine-tuning, and evaluating** models for general-purpose text summarization tasks. Common use cases: - Summarizing news articles for quick reading and content digestion. - Condensing long business reports or documents into executive summaries. - Building summarization pipelines for customer support ticket triage. - Enhancing search engines and knowledge bases with auto-generated abstracts.

提供机构：

tanaos

5,000+

优质数据集

54 个

任务类型

进入经典数据集