five

Data Sheet 1_Divide and summarize: improve SLM text summarization.pdf

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Data_Sheet_1_Divide_and_summarize_improve_SLM_text_summarization_pdf/29756432
下载链接
链接失效反馈
官方服务:
资源简介:
IntroductionText summarization is a longstanding challenge in natural language processing, with recent advancements driven by the adoption of Large Language Models (LLMs) and Small Language Models (SLMs). Despite these developments, issues such as the “Lost in the Middle” problem—where LLMs tend to overlook information in the middle of lengthy prompts—persist. Traditional summarization, often termed the “Stuff” method, processes an entire text in a single pass. In contrast, the “Map” method divides the text into segments, summarizes each independently, and then synthesizes these partial summaries into a final output, potentially mitigating the “Lost in the Middle” issue. This study investigates whether the Map method outperforms the Stuff method for texts that fit within the context window of SLMs and assesses its effectiveness in addressing the “Lost in the Middle” problem. MethodsWe conducted a two-part investigation: first, a simulation study using generated texts, paired with an automated fact-retrieval evaluation to eliminate the need for human assessment; second, a practical study summarizing scientific papers. ResultsResults from both studies demonstrate that the Map method produces summaries that are at least as accurate as those from the Stuff method. Notably, the Map method excels at retaining key facts from the beginning and middle of texts, unlike the Stuff method, suggesting its superiority for SLM-based summarization of smaller texts. Additionally, SLMs using the Map method achieved performance comparable to LLMs using the Stuff method, highlighting its practical utility. DiscussionBoth theoretical and practical studies suggest that using Map method for summarization with SLM allowed to address the “Lost in the Middle” problem and outperform Stuff method.
创建时间:
2025-08-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作