Orange/POSUM_BENCH

Name: Orange/POSUM_BENCH
Creator: Orange
Published: 2026-01-27 09:03:28
License: 暂无描述

Hugging Face2026-01-27 更新2026-02-07 收录

下载链接：

https://hf-mirror.com/datasets/Orange/POSUM_BENCH

下载链接

链接失效反馈

官方服务：

资源简介：

PoSum Bench数据集包含对话以及提取式和抽象式摘要。数据集中的每个实例代表一个对话与来自特定模型或提取策略的摘要配对。该数据集是PoSum Bench论文的一部分，这是第一个全面测试对话摘要任务中位置偏见的基准。数据集构建自多种对话来源，包括英语和法语的会议记录、多轮对话、销售电话对话和推特对话等。所有对话都经过标准化预处理，包括轮次分割和格式规范化。对于每个对话，生成了提取式摘要（使用领先、最近和中间随机策略，提取比例为15%、25%和35%）和抽象式摘要（使用10种不同的LLM模型，采用统一的提示）。数据集分为英语（en）和法语（fr）两部分。每个样本包含多个字段，如唯一标识符、语言代码、对话文本、摘要文本、摘要句子列表、对话长度类别、生成摘要的模型或策略、模型家族、摘要类型（抽象或提取）、忽略的句子索引、领先偏见分数、最近偏见分数、偏见幅度、偏见方向和原始对话的标记计数等。

PoSum Bench dataset contains conversations along with both extractive and abstractive summaries. Each instance in this dataset represents a single conversation paired with a summary from one specific model or extractive strategy. This dataset is part of the PoSum Bench Paper, the first comprehensive benchmark testing positional bias in conversational summarization tasks. The dataset was constructed from diverse conversational sources, including English and French meeting transcripts, multi-turn dialogues, sales call conversations, and Twitter dialogues. All conversations underwent standardized preprocessing, including turn segmentation and format normalization. For each conversation, we generated both extractive summaries (using leading, recency, and middle-random strategies at 15%, 25%, and 35% extraction ratios) and abstractive summaries (using 10 different LLMs with unified prompting). The dataset is divided into two languages: English (en) and French (fr). Each sample contains multiple fields such as unique identifier, language code, conversation text, summary text, list of summary sentences, conversation length class, model or strategy that generated the summary, model family, summary type (abstractive or extractive), ignored sentence indices, leading bias score, recency bias score, bias magnitude, bias direction, and token count of the original conversation.

提供机构：

Orange

5,000+

优质数据集

54 个

任务类型

进入经典数据集