five

nivvis/eq-dpo

收藏
Hugging Face2026-03-19 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/nivvis/eq-dpo
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-generation - conversational tags: - dpo - empathy - emotional-intelligence - preference-learning - eq language: - en size_categories: - 1K<n<10K --- # EQ-DPO: Empathetic Conversation Preference Pairs Turn-level preference pairs for training emotionally intelligent language models via DPO (Direct Preference Optimization). ## Dataset Description 2,880 preference pairs extracted from high-quality multi-turn empathetic conversations. Each pair compares a **chosen** response (from a top-tier model) against a **rejected** response (from the target model) at the same conversational context. ### What makes this dataset different - **Turn-level, not conversation-level** — each pair shares the exact same conversational prefix. The only variable is the response. - **Margin-filtered** — a logit-probe judge scored each pair. Only pairs where the judge was confident (margin > 0.15) are included. Average margin: 0.38. - **Grounded in therapeutic frameworks** — conversations were generated using role cards informed by Rogers' Core Conditions, Motivational Interviewing (OARS), and Nonviolent Communication (NVC). - **Multiple supporter personas** — warm friend, peer support volunteer, therapist. All score equally; authenticity matters more than register. ## Fields | Field | Description | |-------|-------------| | `system` | System prompt (supporter persona / role card) | | `prompt` | Conversation prefix as JSON array of `{role, content}` messages | | `chosen` | The preferred supporter response | | `rejected` | The lower-quality alternative | | `margin` | Judge confidence: P(chosen) - 0.5, range 0.15-0.50 | ## Usage ```python from datasets import load_dataset ds = load_dataset("nivvis/eq-dpo") # For DPO training for example in ds["train"]: system = example["system"] prefix = json.loads(example["prompt"]) # list of {role, content} chosen = example["chosen"] rejected = example["rejected"] margin = example["margin"] # optional: use for margin-weighted DPO ``` ## Generation Pipeline 1. Source posts ranked via Swiss-style Elo tournament (logit-probe A/B judging) 2. Multi-turn conversations synthesized turn-by-turn using multiple models as independent agents 3. Top 20% conversations (by Elo) selected as gold 4. At each supporter turn in gold conversations, 5 alternative responses generated at varying temperatures 5. Each alternative judged against the gold response; worst alternative (highest margin) selected as rejected ## Statistics - **Pairs**: 2,880 - **Source conversations**: 354 (top 20% by Elo) - **Avg conversation prefix**: 10 turns - **Avg chosen length**: 379 chars - **Avg rejected length**: 392 chars - **Margin**: avg 0.38, median 0.41 - **Supporter personas**: warm_friend, peer_support, therapist (equal distribution)
提供机构:
nivvis
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作