iknow-lab/JudgeBias-DPO-RefFree
收藏Hugging Face2026-03-12 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/iknow-lab/JudgeBias-DPO-RefFree
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
size_categories:
- 100K<n<1M
task_categories:
- text-generation
tags:
- dpo
- preference
- llm-as-a-judge
- debiasing
- materials-science
dataset_info:
features:
- name: prompt
dtype: string
- name: chosen
dtype: string
- name: rejected
dtype: string
- name: score_chosen
dtype: float64
- name: score_rejected
dtype: float64
- name: score_delta
dtype: float64
- name: anchor_score
dtype: float64
- name: sample_id
dtype: int64
- name: perturbation_type
dtype: string
- name: perturbation_category
dtype: string
- name: perturbation_rate
dtype: float64
- name: chosen_model
dtype: string
- name: rejected_model
dtype: string
splits:
- name: train
num_bytes: 525031934
num_examples: 91639
- name: validation
num_bytes: 58266972
num_examples: 10183
config_name: default
configs:
- config_name: default
data_files:
- split: train
path: train.parquet
- split: validation
path: validation.parquet
---
# JudgeBias-DPO: Reference-Free Judge Debiasing Dataset
A DPO dataset for training LLM judges to evaluate materials science synthesis recipes without bias in a **reference-free** setting (no ground truth recipe).
## Motivation
LLM-as-a-Judge models exhibit systematic biases when evaluating AI-generated synthesis recipes:
- **Representational bias**: Penalizing semantically equivalent surface-form changes (e.g., chemical formula vs. IUPAC name)
- **Error insensitivity**: Failing to detect injected scientific errors (e.g., element substitutions, wrong temperatures)
This dataset trains judges to be **invariant to representational changes** while remaining **sensitive to scientific errors**.
## Construction: Anchor-Consensus
**Source**: 2,000 samples from [AlchemyBench](https://github.com/AiChemistLab/AlchemyBench), evaluated by 4 judge models (Qwen3-8B, Qwen3-32B, Llama-3.1-8B-Instruct, gemini-2.5-flash) across 17 perturbation datasets (9 error + 8 representational).
**Anchor score**: Per-sample robust quality estimate computed as `median(4 models × 5 representational rates)` — up to 20 evaluations per sample.
**Direction-aware pairing**: For each C(4,2)=6 model pair per sample:
- **Representational** (meaning preserved): `chosen` = higher score (closer to anchor), `rejected` = lower score
- **Error** (errors injected): `chosen` = lower score (detected errors), `rejected` = higher score (missed errors)
**Filtering**: score delta >= 0.5, anchor-based quality filter, max 5 pairs per sample per dataset, SHA-256 dedup.
## Dataset Format
Compatible with [TRL DPOTrainer](https://huggingface.co/docs/trl/dpo_trainer) conversational format.
| Field | Description |
|---|---|
| `prompt` | `[{system: judge_prompt}, {user: evaluation_request}]` (JSON string) |
| `chosen` | `[{assistant: unbiased_evaluation}]` (JSON string) |
| `rejected` | `[{assistant: biased_evaluation}]` (JSON string) |
| `score_chosen/rejected` | Overall score (1-5) |
| `score_delta` | Absolute score difference |
| `anchor_score` | Per-sample anchor from representational consensus |
| `perturbation_category` | `error` or `represent` |
## Statistics
| Metric | Value |
|---|---|
| Total pairs | 101,879 |
| Train / Validation | 91,639 / 10,183 |
| Error / Representational | 59,049 (58%) / 42,830 (42%) |
| Unique samples | 2,000 |
| Score delta | mean=1.06, median=0.9 |
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("iknow-lab/JudgeBias-DPO-RefFree")
train = dataset["train"]
val = dataset["validation"]
```
提供机构:
iknow-lab



