five

SamSum-Pref

收藏
魔搭社区2025-12-03 更新2025-11-22 收录
下载链接:
https://modelscope.cn/datasets/dada122/SamSum-Pref
下载链接
链接失效反馈
官方服务:
资源简介:
# SamSum-Pref Dataset SamSum-Pref is a preference-aligned dialogue summarization dataset constructed by sampling from **dadastory/SummOrchestra-Qwen3-8B-GRPO-BRL-SAMSUM**, and filtering samples using **DeepSeek-V3** as the evaluator. Preference scoring follows the **AnythingReward** evaluation paradigm, adapted to a strict rubric for dialogue-summary quality. ## Evaluation Principles Each sampled summary is scored according to the following weighted criteria: 1. **Key Information Coverage (40%)** - Captures core elements: request/proposal, refusal, insistence, and implied motivation. - Missing any major element is a critical error. 2. **Inference & Implicit Understanding (30%)** - Correctly reflects implied attitudes or emotional tone. - Encourages reasonable inference; penalizes fabrication. 3. **Faithfulness & Precision (20%)** - No hallucinations; meaning preserved. - Summary must remain strictly grounded in the dialogue. 4. **Conciseness & Clarity (10%)** - Brief, well-structured, readable. - Verbosity lowers the score. **Conflict resolution priority:** Key coverage **>** Faithfulness **>** Inference **>** Clarity. ## Sampling & Filtering - Ten samples are randomly drawn per batch from the base model. - DeepSeek-V3 provides a 1–5 preference score using the above rubric. - Only summaries with **score = 5** and judged **better than the original SamSum summary** in faithfulness and human preference alignment are retained. ## Data Format Each accepted entry is stored as a dictionary: ```python { "system_prompt": system_prompt, "instruction": instruction, "reason_content": reason_content, "summary": summary } ``` ## Purpose SamSum-Pref provides a high-quality, preference-filtered benchmark for training and evaluating dialogue summarization models with strong grounding, human-like judgment, and improved alignment over the original SamSum dataset.

# SamSum-Pref 数据集 SamSum-Pref 是一款对齐偏好的对话摘要数据集,其构建流程为从**dadastory/SummOrchestra-Qwen3-8B-GRPO-BRL-SAMSUM**中采样样本,并以**DeepSeek-V3**作为评估器完成样本筛选。偏好评分遵循**AnythingReward**评估范式,并针对对话摘要质量适配了严格的评分准则。 ## 评估准则 每份采样得到的摘要将按照以下加权标准进行评分: 1. **关键信息覆盖率(40%)** - 需捕捉对话核心要素:请求/提议、拒绝、坚持以及隐含动机。 - 遗漏任意主要要素均属于严重错误。 2. **推理与隐含理解能力(30%)** - 需准确反映对话中的隐含态度或情绪基调。 - 鼓励合理推断,但严禁编造无关内容。 3. **忠实性与精准性(20%)** - 不得出现幻觉内容,需完整保留原文语义。 - 摘要必须严格基于原始对话内容生成。 4. **简洁性与清晰度(10%)** - 表述简洁、结构清晰、易于阅读理解。 - 冗余表述将降低最终评分。 **冲突解决优先级**: 关键信息覆盖率 **>** 忠实性 **>** 推理能力 **>** 清晰度。 ## 采样与筛选流程 - 每个批次随机抽取10个来自基础模型的生成摘要样本。 - DeepSeek-V3 将按照上述评分准则为每份样本给出1~5分的偏好评分。 - 仅保留评分**为5分**,且在忠实性与人类偏好对齐程度上优于原始SamSum摘要的样本。 ## 数据格式 每条符合收录标准的条目均以Python字典形式存储,结构如下: python { "system_prompt": "系统提示词", "instruction": "生成指令", "reason_content": "评估理由内容", "summary": "生成摘要" } ## 应用价值 SamSum-Pref 可提供高质量、经偏好筛选的基准数据集,用于训练与评估具备强锚定能力、类人类判断逻辑,且相较于原始SamSum数据集对齐效果更优的对话摘要模型。
提供机构:
maas
创建时间:
2025-11-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作