nivvis/eq-dpo
收藏Hugging Face2026-03-19 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/nivvis/eq-dpo
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-generation
- conversational
tags:
- dpo
- empathy
- emotional-intelligence
- preference-learning
- eq
language:
- en
size_categories:
- 1K<n<10K
---
# EQ-DPO: Empathetic Conversation Preference Pairs
Turn-level preference pairs for training emotionally intelligent language models via DPO (Direct Preference Optimization).
## Dataset Description
2,880 preference pairs extracted from high-quality multi-turn empathetic conversations. Each pair compares a **chosen** response (from a top-tier model) against a **rejected** response (from the target model) at the same conversational context.
### What makes this dataset different
- **Turn-level, not conversation-level** — each pair shares the exact same conversational prefix. The only variable is the response.
- **Margin-filtered** — a logit-probe judge scored each pair. Only pairs where the judge was confident (margin > 0.15) are included. Average margin: 0.38.
- **Grounded in therapeutic frameworks** — conversations were generated using role cards informed by Rogers' Core Conditions, Motivational Interviewing (OARS), and Nonviolent Communication (NVC).
- **Multiple supporter personas** — warm friend, peer support volunteer, therapist. All score equally; authenticity matters more than register.
## Fields
| Field | Description |
|-------|-------------|
| `system` | System prompt (supporter persona / role card) |
| `prompt` | Conversation prefix as JSON array of `{role, content}` messages |
| `chosen` | The preferred supporter response |
| `rejected` | The lower-quality alternative |
| `margin` | Judge confidence: P(chosen) - 0.5, range 0.15-0.50 |
## Usage
```python
from datasets import load_dataset
ds = load_dataset("nivvis/eq-dpo")
# For DPO training
for example in ds["train"]:
system = example["system"]
prefix = json.loads(example["prompt"]) # list of {role, content}
chosen = example["chosen"]
rejected = example["rejected"]
margin = example["margin"] # optional: use for margin-weighted DPO
```
## Generation Pipeline
1. Source posts ranked via Swiss-style Elo tournament (logit-probe A/B judging)
2. Multi-turn conversations synthesized turn-by-turn using multiple models as independent agents
3. Top 20% conversations (by Elo) selected as gold
4. At each supporter turn in gold conversations, 5 alternative responses generated at varying temperatures
5. Each alternative judged against the gold response; worst alternative (highest margin) selected as rejected
## Statistics
- **Pairs**: 2,880
- **Source conversations**: 354 (top 20% by Elo)
- **Avg conversation prefix**: 10 turns
- **Avg chosen length**: 379 chars
- **Avg rejected length**: 392 chars
- **Margin**: avg 0.38, median 0.41
- **Supporter personas**: warm_friend, peer_support, therapist (equal distribution)
提供机构:
nivvis



