Data Sheet 1_Divide and summarize: improve SLM text summarization.pdf

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://figshare.com/articles/dataset/Data_Sheet_1_Divide_and_summarize_improve_SLM_text_summarization_pdf/29756432

下载链接

链接失效反馈

官方服务：

资源简介：

IntroductionText summarization is a longstanding challenge in natural language processing, with recent advancements driven by the adoption of Large Language Models (LLMs) and Small Language Models (SLMs). Despite these developments, issues such as the “Lost in the Middle” problem—where LLMs tend to overlook information in the middle of lengthy prompts—persist. Traditional summarization, often termed the “Stuff” method, processes an entire text in a single pass. In contrast, the “Map” method divides the text into segments, summarizes each independently, and then synthesizes these partial summaries into a final output, potentially mitigating the “Lost in the Middle” issue. This study investigates whether the Map method outperforms the Stuff method for texts that fit within the context window of SLMs and assesses its effectiveness in addressing the “Lost in the Middle” problem. MethodsWe conducted a two-part investigation: first, a simulation study using generated texts, paired with an automated fact-retrieval evaluation to eliminate the need for human assessment; second, a practical study summarizing scientific papers. ResultsResults from both studies demonstrate that the Map method produces summaries that are at least as accurate as those from the Stuff method. Notably, the Map method excels at retaining key facts from the beginning and middle of texts, unlike the Stuff method, suggesting its superiority for SLM-based summarization of smaller texts. Additionally, SLMs using the Map method achieved performance comparable to LLMs using the Stuff method, highlighting its practical utility. DiscussionBoth theoretical and practical studies suggest that using Map method for summarization with SLM allowed to address the “Lost in the Middle” problem and outperform Stuff method.

创建时间：

2025-08-01

5,000+

优质数据集

54 个

任务类型

进入经典数据集