sumuks/coval-world-prefs
收藏Hugging Face2026-03-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/sumuks/coval-world-prefs
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: Coval World Prefs
language:
- en
license: other
task_categories:
- text-generation
tags:
- dpo
- preference-optimization
- rankings
- openai
- coval
size_categories:
- 1K<n<10K
configs:
- config_name: default
data_files:
- split: train
path: train-00000-of-00001.parquet
- split: test
path: test-00000-of-00001.parquet
---
# Dataset Card for Coval World Prefs
## Dataset Summary
This dataset is derived from `openai/coval` by using only the annotators' `world` rankings.
For each prompt, the script aggregates all available `world` ranking strings into a mean rank per response label, then emits pairwise DPO-style preferences where the lower mean-rank response becomes `chosen`.
## Dataset Structure
- Train rows: 5762
- Test rows: 637
- Total rows: 6399
- Prompt-level split seed: `7`
- Test fraction: `0.1`
Each row contains these key fields:
- `prompt`: Rendered conversation transcript used as the prompt context.
- `prompt_messages`: Original prompt message list from Coval.
- `chosen`: Preferred assistant response text.
- `rejected`: Less preferred assistant response text.
- `difficulty`: Difficulty score in `[0, 1]`, where larger means the preference is harder because the mean-rank gap is smaller.
- `rank_margin`: Difference between rejected and chosen mean rank. Larger means stronger preference.
- `chosen_mean_rank` / `rejected_mean_rank`: Mean response ranks aggregated across `world` assessments.
- `num_world_assessments`: Number of world-ranking assessments used for that prompt.
## Construction Notes
- Only `world` ranking blocks are used.
- Ranking strings like `A>B>C=D` are converted into numeric ranks with average-tie handling.
- Pairwise examples are created for every strict response pair implied by the aggregated mean ranks.
- Train/test splitting happens at the prompt level to avoid prompt leakage across splits.
提供机构:
sumuks



