five

tanaos/synthetic-text-summarization-dataset-v1

收藏
Hugging Face2026-03-29 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/tanaos/synthetic-text-summarization-dataset-v1
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: mit tags: - text-simplification - news-articles-summarization - text2text-generation - seq2seq - nlp - synthetic-data - tanaos pretty_name: tanaos-text-summarization-v1 Training Dataset task_ids: - text-simplification - news-articles-summarization - text2text-generation size_categories: - 15K<n<30K --- <p align="center"> <img src="https://raw.githubusercontent.com/tanaos/.github/master/assets/logo.png" width="250px" alt="Tanaos – Train task specific LLMs without training data, for offline NLP and Text Classification"> </p> # Tanaos Text Summarization Training Dataset This dataset was created synthetically by Tanaos with the [Artifex](https://github.com/tanaos/artifex) Python library. The dataset is designed to **train and evaluate text summarization systems** — models that generate a concise, abstractive summary of a longer input text. It can be used to build summarization models for various applications, such as news summarization, document condensation, and content digestion. Our flagship text summarization model, [tanaos-text-summarization-v1](https://huggingface.co/tanaos/tanaos-text-summarization-v1), was trained on this dataset. ## Dataset Summary The dataset contains pairs of input texts and their corresponding abstractive summaries. Each sample consists of a `text` field with the source document and a `summary` field with a concise human-readable summary. Text samples span various domains, including news articles, business announcements, scientific findings, local events, and product launches. ## How to Use ```python from datasets import load_dataset dataset = load_dataset("tanaos/synthetic-text-summarization-dataset-v1") print(dataset["train"][0]) ``` ## Intended Use This dataset is meant for **training, fine-tuning, and evaluating** models for general-purpose text summarization tasks. Common use cases: - Summarizing news articles for quick reading and content digestion. - Condensing long business reports or documents into executive summaries. - Building summarization pipelines for customer support ticket triage. - Enhancing search engines and knowledge bases with auto-generated abstracts.
提供机构:
tanaos
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作