pszemraj/summary-map-reduce-v1
收藏Hugging Face2024-12-05 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/pszemraj/summary-map-reduce-v1
下载链接
链接失效反馈官方服务:
资源简介:
数据集summary-map-reduce-v1用于训练文本到文本模型,以在map-reduce摘要的reduce步骤中整合来自长文档的多个摘要。每个示例包含来自长文档的分块摘要,这些摘要以`
`为分隔符连接成一个字符串(`input_summaries`),以及它们的合成生成整合/改进版本(`final_summary`)。整合步骤的重点包括合并冗余信息、解决不一致性、保持叙述流程和逻辑顺序,以及进行一般性改进。为了验证和过滤低质量的`final_summary`示例,输入和输出摘要都使用gte-large-en-v1.5进行了嵌入,余弦相似度低于0.75的示例被移除。数据集的来源包括使用各种文本到文本摘要模型生成的摘要,以及使用Llama 3.1 70B Instruct模型生成的整合版本。
The dataset summary-map-reduce-v1 is designed for training text-to-text models to consolidate multiple summaries from a long document in the reduce step of map-reduce summarization. Each example contains chunked summaries from a long document, concatenated into a single string with `
` as delimiter (`input_summaries`), and their synthetically generated consolidated/improved version (`final_summary`). The consolidation step focuses on merging redundant information, resolving inconsistencies, preserving narrative flow and logical order, and making general improvements. To validate and filter out low-quality `final_summary` examples, both input and output summaries were embedded using gte-large-en-v1.5, and examples with a cosine similarity below 0.75 were removed. The dataset sources include summaries generated using various text-to-text summarization models and consolidated versions created using the Llama 3.1 70B Instruct model.
提供机构:
pszemraj



