SamSum-Pref

Name: SamSum-Pref
Creator: maas
Published: 2025-12-03 09:40:43
License: 暂无描述

魔搭社区2025-12-03 更新2025-11-22 收录

下载链接：

https://modelscope.cn/datasets/dada122/SamSum-Pref

下载链接

链接失效反馈

官方服务：

资源简介：

# SamSum-Pref Dataset SamSum-Pref is a preference-aligned dialogue summarization dataset constructed by sampling from **dadastory/SummOrchestra-Qwen3-8B-GRPO-BRL-SAMSUM**, and filtering samples using **DeepSeek-V3** as the evaluator. Preference scoring follows the **AnythingReward** evaluation paradigm, adapted to a strict rubric for dialogue-summary quality. ## Evaluation Principles Each sampled summary is scored according to the following weighted criteria: 1. **Key Information Coverage (40%)** - Captures core elements: request/proposal, refusal, insistence, and implied motivation. - Missing any major element is a critical error. 2. **Inference & Implicit Understanding (30%)** - Correctly reflects implied attitudes or emotional tone. - Encourages reasonable inference; penalizes fabrication. 3. **Faithfulness & Precision (20%)** - No hallucinations; meaning preserved. - Summary must remain strictly grounded in the dialogue. 4. **Conciseness & Clarity (10%)** - Brief, well-structured, readable. - Verbosity lowers the score. **Conflict resolution priority:** Key coverage **>** Faithfulness **>** Inference **>** Clarity. ## Sampling & Filtering - Ten samples are randomly drawn per batch from the base model. - DeepSeek-V3 provides a 1–5 preference score using the above rubric. - Only summaries with **score = 5** and judged **better than the original SamSum summary** in faithfulness and human preference alignment are retained. ## Data Format Each accepted entry is stored as a dictionary: ```python { "system_prompt": system_prompt, "instruction": instruction, "reason_content": reason_content, "summary": summary } ``` ## Purpose SamSum-Pref provides a high-quality, preference-filtered benchmark for training and evaluating dialogue summarization models with strong grounding, human-like judgment, and improved alignment over the original SamSum dataset.

# SamSum-Pref 数据集 SamSum-Pref 是一款对齐偏好的对话摘要数据集，其构建流程为从**dadastory/SummOrchestra-Qwen3-8B-GRPO-BRL-SAMSUM**中采样样本，并以**DeepSeek-V3**作为评估器完成样本筛选。偏好评分遵循**AnythingReward**评估范式，并针对对话摘要质量适配了严格的评分准则。 ## 评估准则每份采样得到的摘要将按照以下加权标准进行评分： 1. **关键信息覆盖率（40%）** - 需捕捉对话核心要素：请求/提议、拒绝、坚持以及隐含动机。 - 遗漏任意主要要素均属于严重错误。 2. **推理与隐含理解能力（30%）** - 需准确反映对话中的隐含态度或情绪基调。 - 鼓励合理推断，但严禁编造无关内容。 3. **忠实性与精准性（20%）** - 不得出现幻觉内容，需完整保留原文语义。 - 摘要必须严格基于原始对话内容生成。 4. **简洁性与清晰度（10%）** - 表述简洁、结构清晰、易于阅读理解。 - 冗余表述将降低最终评分。 **冲突解决优先级**：关键信息覆盖率 **>** 忠实性 **>** 推理能力 **>** 清晰度。 ## 采样与筛选流程 - 每个批次随机抽取10个来自基础模型的生成摘要样本。 - DeepSeek-V3 将按照上述评分准则为每份样本给出1~5分的偏好评分。 - 仅保留评分**为5分**，且在忠实性与人类偏好对齐程度上优于原始SamSum摘要的样本。 ## 数据格式每条符合收录标准的条目均以Python字典形式存储，结构如下： python { "system_prompt": "系统提示词", "instruction": "生成指令", "reason_content": "评估理由内容", "summary": "生成摘要" } ## 应用价值 SamSum-Pref 可提供高质量、经偏好筛选的基准数据集，用于训练与评估具备强锚定能力、类人类判断逻辑，且相较于原始SamSum数据集对齐效果更优的对话摘要模型。

提供机构：

maas

创建时间：

2025-11-17

5,000+

优质数据集

54 个

任务类型

进入经典数据集