five

pszemraj/summary-map-reduce-v1

收藏
Hugging Face2024-12-05 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/pszemraj/summary-map-reduce-v1
下载链接
链接失效反馈
官方服务:
资源简介:
数据集summary-map-reduce-v1用于训练文本到文本模型,以在map-reduce摘要的reduce步骤中整合来自长文档的多个摘要。每个示例包含来自长文档的分块摘要,这些摘要以` `为分隔符连接成一个字符串(`input_summaries`),以及它们的合成生成整合/改进版本(`final_summary`)。整合步骤的重点包括合并冗余信息、解决不一致性、保持叙述流程和逻辑顺序,以及进行一般性改进。为了验证和过滤低质量的`final_summary`示例,输入和输出摘要都使用gte-large-en-v1.5进行了嵌入,余弦相似度低于0.75的示例被移除。数据集的来源包括使用各种文本到文本摘要模型生成的摘要,以及使用Llama 3.1 70B Instruct模型生成的整合版本。

The dataset summary-map-reduce-v1 is designed for training text-to-text models to consolidate multiple summaries from a long document in the reduce step of map-reduce summarization. Each example contains chunked summaries from a long document, concatenated into a single string with ` ` as delimiter (`input_summaries`), and their synthetically generated consolidated/improved version (`final_summary`). The consolidation step focuses on merging redundant information, resolving inconsistencies, preserving narrative flow and logical order, and making general improvements. To validate and filter out low-quality `final_summary` examples, both input and output summaries were embedded using gte-large-en-v1.5, and examples with a cosine similarity below 0.75 were removed. The dataset sources include summaries generated using various text-to-text summarization models and consolidated versions created using the Llama 3.1 70B Instruct model.
提供机构:
pszemraj
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作