pszemraj/summary-map-reduce-v1

Name: pszemraj/summary-map-reduce-v1
Creator: pszemraj
Published: 2024-12-05 05:07:11
License: 暂无描述

Hugging Face2024-12-05 更新2024-12-14 收录

下载链接：

https://hf-mirror.com/datasets/pszemraj/summary-map-reduce-v1

下载链接

链接失效反馈

官方服务：

资源简介：

数据集summary-map-reduce-v1用于训练文本到文本模型，以在map-reduce摘要的reduce步骤中整合来自长文档的多个摘要。每个示例包含来自长文档的分块摘要，这些摘要以` `为分隔符连接成一个字符串（`input_summaries`），以及它们的合成生成整合/改进版本（`final_summary`）。整合步骤的重点包括合并冗余信息、解决不一致性、保持叙述流程和逻辑顺序，以及进行一般性改进。为了验证和过滤低质量的`final_summary`示例，输入和输出摘要都使用gte-large-en-v1.5进行了嵌入，余弦相似度低于0.75的示例被移除。数据集的来源包括使用各种文本到文本摘要模型生成的摘要，以及使用Llama 3.1 70B Instruct模型生成的整合版本。

The dataset summary-map-reduce-v1 is designed for training text-to-text models to consolidate multiple summaries from a long document in the reduce step of map-reduce summarization. Each example contains chunked summaries from a long document, concatenated into a single string with ` ` as delimiter (`input_summaries`), and their synthetically generated consolidated/improved version (`final_summary`). The consolidation step focuses on merging redundant information, resolving inconsistencies, preserving narrative flow and logical order, and making general improvements. To validate and filter out low-quality `final_summary` examples, both input and output summaries were embedded using gte-large-en-v1.5, and examples with a cosine similarity below 0.75 were removed. The dataset sources include summaries generated using various text-to-text summarization models and consolidated versions created using the Llama 3.1 70B Instruct model.

提供机构：

pszemraj

5,000+

优质数据集

54 个

任务类型

进入经典数据集