nivvis/eq-dpo

Name: nivvis/eq-dpo
Creator: nivvis
Published: 2026-03-19 18:02:57
License: 暂无描述

Hugging Face2026-03-19 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/nivvis/eq-dpo

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - text-generation - conversational tags: - dpo - empathy - emotional-intelligence - preference-learning - eq language: - en size_categories: - 1K<n<10K --- # EQ-DPO: Empathetic Conversation Preference Pairs Turn-level preference pairs for training emotionally intelligent language models via DPO (Direct Preference Optimization). ## Dataset Description 2,880 preference pairs extracted from high-quality multi-turn empathetic conversations. Each pair compares a **chosen** response (from a top-tier model) against a **rejected** response (from the target model) at the same conversational context. ### What makes this dataset different - **Turn-level, not conversation-level** — each pair shares the exact same conversational prefix. The only variable is the response. - **Margin-filtered** — a logit-probe judge scored each pair. Only pairs where the judge was confident (margin > 0.15) are included. Average margin: 0.38. - **Grounded in therapeutic frameworks** — conversations were generated using role cards informed by Rogers' Core Conditions, Motivational Interviewing (OARS), and Nonviolent Communication (NVC). - **Multiple supporter personas** — warm friend, peer support volunteer, therapist. All score equally; authenticity matters more than register. ## Fields | Field | Description | |-------|-------------| | `system` | System prompt (supporter persona / role card) | | `prompt` | Conversation prefix as JSON array of `{role, content}` messages | | `chosen` | The preferred supporter response | | `rejected` | The lower-quality alternative | | `margin` | Judge confidence: P(chosen) - 0.5, range 0.15-0.50 | ## Usage ```python from datasets import load_dataset ds = load_dataset("nivvis/eq-dpo") # For DPO training for example in ds["train"]: system = example["system"] prefix = json.loads(example["prompt"]) # list of {role, content} chosen = example["chosen"] rejected = example["rejected"] margin = example["margin"] # optional: use for margin-weighted DPO ``` ## Generation Pipeline 1. Source posts ranked via Swiss-style Elo tournament (logit-probe A/B judging) 2. Multi-turn conversations synthesized turn-by-turn using multiple models as independent agents 3. Top 20% conversations (by Elo) selected as gold 4. At each supporter turn in gold conversations, 5 alternative responses generated at varying temperatures 5. Each alternative judged against the gold response; worst alternative (highest margin) selected as rejected ## Statistics - **Pairs**: 2,880 - **Source conversations**: 354 (top 20% by Elo) - **Avg conversation prefix**: 10 turns - **Avg chosen length**: 379 chars - **Avg rejected length**: 392 chars - **Margin**: avg 0.38, median 0.41 - **Supporter personas**: warm_friend, peer_support, therapist (equal distribution)

提供机构：

nivvis

5,000+

优质数据集

54 个

任务类型

进入经典数据集